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I OVERVIEW 

This  dissertation  reports  the  proofs  by  structural  induction  of  two  compilers.  The  larger 
is  a compiler  from  a subset  of  pure  Lisp  Into  a machine  language,  and  is  similar  to  a compiler 
proved  by  London  [London 721  The  smaller  is  the  compiler  proved  by  McCarthy  and  Painter 
[McCarthy67].  In  addition  a portion  of  another  compiler  is  proved.  It  is  the  compiler  given 
by  Wirth  [WirthTS]  to  compile  a simple  Pascal-like  language  (PL/0)  into  code  for  a 
hypothetical  stack-oriented  machine.  We  here  employ  several  methods  of  proof,  different 
from  the  previous  methods,  which  are  amenable  to  machine-aided  proofs.  These  new 
methods  include  the  use  of  Hoare  proof  rules  [Hoare69]  to  describe  the  semantics  of  the 
source  and  target  languages,  as  well  as  the  language  in  which  the  compiler  was  written,  a new 
formalization  of  substitution,  and  axiomatic  definition  of  functions,  both  in  the  compiler  and 
in  the  assertion  language.  The  machine  assistance  was  provided  by  the  Xivus  program 
verification  system,  which  is  a later  version  of  the  system  described  by  Good,  London,  and 
Bledsoe  [Good75].  We  believe  the  methods  of  compiler  proof  given  here  are  of  general 
application  in  the  proving  of  compilers  including  such  source  language  features  as  assignment, 
conditional,  repetition  (WHILE),  and  go  to  statements,  recursive  and  non-recursive  function 
calls,  and  statement-  or  expression-oriented  languages. 

In  Chapter  2 the  problem  of  proving  correctness  of  compilers  is  outlined.  The  methods 
of  proof  used  in  these  compiler  proofs  are  given  in  detail  in  Chapter  S.  The  statement  of 
precisely  what  was  proved  about  the  compilers  is  developed  in  Chapter  4,  along  with  the 
shorter  of  the  proofs  and  the  portion  of  the  PL/0  compiler  proof.  The  major  portions  of  the 
longer  proof  are  left  to  the  appendix.  Previous  proofs  of  compilers  are  described  in  Chapter 
5,  concluding  the  chapter  with  a discussion  of  the  significance  of  our  approach  in  relation  to 
previous  work.  Chapters  6 and  7 present  the  conclusions  reached  in  completing  the  compiler 
proofs  and  areas  for  future  research,  respectively.  References  cited  in  the  text  follow  the  last 
chapter. 
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2 PROBLEM  STATEMENT 


2.1  Need  for  Proving  Programs 

Recent  experience  with  computing  systems  has  shown  a startling  portion  of  time  and 
cost  to  be  expended  on  efforts  to  make  software  work  correctly.  The  case  has  been  presented 
several  times,  for  example  in  London  [London75],  that  program  proving  Is  a valuable  tool  to 
aid  this  effort.  We  will  therefore  restrict  the  remainder  of  this  chapter  to  examining  the 
motivation  for  applying  this  tool  to  the  compiler  correctness  problem. 


2.2  Importance  of  Compiler  Correctness 

This  dissertation  involves  the  application  of  correctness  proofs  to  compilers.  Why 
compilers?  First,  compilers  are  heavily  used.  Indeed  in  many  installations  nearly  every 
program  run  is  processed  by  a compiler.  So  the  existence  of  errors  in  compilers  will  likely  cost 
more  than  errors  in  less-used  programs. 

Second,  an  error  in  a compiler  is  difficult  for  the  user  to  distinguish  from  his  own 
programming  errors,  thereby  increasing  his  debugging  costs.  Often  the  user’s  only  means  of 
remedying  a problem  caused  by  a faulty  compiler  is  to  complicate  his  own  program 
unnecessarily.  This  can  come  about  because  debug  time  and  funds  are  no  longer  available  for 
a "working"  compiler,  or  because  of  insufficient  local  support  for  a compiler  that  was  written 
elsewhere.  In  any  case,  the  user  often  must  program  around  compiler  problems  and  end  up 
with  a degraded  program  in  terms  of  undeixandability  and  maintainability.  The  problem  is 
compounded  because  most  computer  users  will  not  or  can  not  beconw  familiar  with  the 
internal  workings  of  a compiler,  or  with  the  target  code  produced.  Yet  they  must  use  the 
compiler  in  order  to  write  programs  in  the  compiler’s  source  language.  So  users  must  turn  to  a 
compiler  expert,  if  one  is  available,  thus  requiring  at  least  two  persons  rather  than  one  to 
correct  compiler  errors.  While  this  problem  is  shared  by  any  program  used  by  other  than  the 
author,  compilers  are  probably  the  most  prominent  example  of  it  (perhaps  second  to  operating 
systems). 

Further,  the  fact  that  many  programmers  usually  work  on  creating  a compiler  means 
that  a compiler  is  subject  to  the  subtle  kinds  of  problems  caused  by  bad  communication 
between  parts  of  the  compiler  written  by  different  persons.  These  interfacing  problems  could 
be  found  and  eliminated  by  proving  compilers,  or  possibly  by  Just  designing  compilers  with 
proving  in  mind.  While  it  may  be  pointed  out  that  compilers  proved  In  the  past  have  been  of 
much  smaller  scale  than  this,  we  believe  that  continuing  work  on  proving  compilers, 
particularly  with  machine  aid,  will  eventually  allow  much  larger  ones  to  be  proved. 

Any  proof  that  a user’s  program  is  correct  can  be  invalidated  if  an  error  exists  in  the 
compiler  he  is  using.  This  is  so  because  such  proofs  are  usually  based  on  a specification  of 
what  the  source  language  of  the  compiler  is  intended  to  do,  not  of  what  the  compiler  actually 
does  with  source  language  programs. 
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2.3  Contrast  with  Non-compiler  Proofs 

What  makes  compiler  proofs  a different  problem  from  proving  nearly  all  other 
programs?  There  is  not  just  a single  correct  (target  code)  result  of  compiling  a program. 
Thus  the  statement  of  the  compiler's  task  must  state  not  that  the  correct  answer  results  from 
the  compiler,  but  rather  that  the  result  from  the  compiler  has  certain  properties.  Further, 
these  properties  relate  to  the  results  of  executing  the  compiler  output,  thereby  placing  a second 
level  of  execution  into  the  proof.  Those  properties  will  be  the  equivalent  (in  target  language 
terms)  of  the  properties  that  the  source  language  program  has.  Another  complication  is 
determining  exactly  what  the  properties  are  that  must  be  preserved  during  compiling.  In  some 
languages,  such  as  pure  Lisp,  this  Includes  only  that  the  same  value  is  returned,  while  in  other 
languages,  such  as  Algol,  the  correspondence  between  all  variable  names  and  values  must  be 
preserved,  as  well  as  the  values  of  certain  hidden  entities,  such  as  I/O  files,  recursion  stacks, 
and  returned  values  of  functions. 

T his,  of  course,  means  that  the  proof  of  correctness  of  the  compiler  must  include  a 
description  of  the  meaning  or  semantics  of  both  the  source  and  target  languages  in  addition  to 
the  usual  semantics  of  the  programming  language  (of  the  compiler).  We  will  need  to  express 
by  symbolic  means  part  of  the  target  language  code  that  corresponds  to  any  non-terminal 
source  language  syntactic  type  (one  that  may  contain  other  syntactic  types  as  parts).  For 
example,  a piece  of  target  code  may  be  referred  to  as  "the  compilation  of  the  first  argument." 
This  arises  naturally  from  the  general  plan  of  the  proof,  which  uses  structural  induction 
[Burstalt69a],  or  induction  on  the  source  language  structural  parts.  In  other  types  of  program 
proofs  that  involve  the  application  of  Hoare  rules,  we  generally  apply  the  rules  to  assertions 
and  code,  both  of  which  are  given  completely  without  the  use  of  names  or  symbols  to  represent 
parts  of  assertions  or  code.  But  here  we  will  apply  Hoare  rules  to  symbolically  represented 
code  and  assertions.  We  must  use  a notation  to  show  the  operations  of  the  Hoare  rule  (usually 
substitution  of  expressions  for  names)  being  applied  to  symbolically  expressed  pieces  of  code. 
This  is  then  a further  level  of  symbolism  which  complicates  compiler  proofs. 

Further  problems  with  compilers  arise  because  they  are  basically  non-numeric  programs; 
they  are  generally  oriented  toward  character  string  handling,  since  most  computer  languages 
are  expressed  in  character  strings,  while  much  previous  program  proving  work  has  been  on 
numeric  programs.  Also,  we  have  the  recursive  nature  of  compiler  source  languages.  Even 
the  simplest  of  languages  defines  allowable  expressions  in  recursive  terms,  while  many 
languages  also  have  recursive  definitions  of  the  allowable  statement  structure.  Thus  parts  of 
the  input  to  a compiler  can  be  of  any  finite  size,  requiring  the  use  of  induction  on  that  size  to 
prove  that  the  compiler  will  handle  the  input,  no  matter  how  large. 

To  make  matters  worse,  most  target  languages  do  not  have  recursion,  or  at  best  have 
only  a stack  to  accomplish  all  forms  of  recursion.  Thus,  compilers  often  have  an  undoing  or 
transforming  of  recursion  within  them,  complicating  their  proof  of  correctness. 

2.4  Correct  Input  Assumed 

We  will  always  assume  that  a source  language  program  is  a legally  executable  program. 
Finding  and  preventing  compile-time  and  run-time  errors  in  the  source  program  is  a problem 
which  we  will  not  address.  We  simply  wish  to  show  that  a compiled  program  will  have  the 
same  effect  as  its  assumed  correct  source  would. 


4 


PROBLEM  STATEMENT 


The  style  of  these  proofs  could,  however,  be  used  to  prove  a compiler  which  did  error 
checking  of  the  source  code  at  compile  time  and  produce  error-checking  target  code  at  run 
time.  In  order  to  accomplish  compile-time  checking  It  would  be  necessary  to  prove  a separate 
case  of  Input  syntactic  type,  that  of  "none  of  the  above  types."  An  indication  of  error  would 
then  be  specified  as  the  meaning  of  such  a type.  For  run-time  checking— for  example, 
division  by  zero— it  would  be  necessary  to  define  (within  the  source  language  Hoare  rules)  the 
error  conditions  and  their  results.  Then  it  would  be  proved  during  the  compiler  proof  that 
the  proper  error  indications  were  produced  under  appropriate  error  conditions  at  target 
language  run  time. 


5 


i METHODS  OF  PROOF 


3.1  Introduction 

T he  larger  compiler  proof,  which  is  for  the  most  part  given  in  the  appendix,  uses  many 
methods  and  techniques,  some  new  and  others  previously  applied  in  program  proving.  In  the 
following  sections  is  given  a description  of  each  method  and  where  it  is  applied  in  this 
compiler  proof.  I'he  significance  of  the  new  methods  and  their  relation  to  previous  work  are 
presented  in  Section  5.4. 


3.2  Comparison  to  CO 

The  compiler  proved  here  is  a modification  of  one  called  CO,  which  was  written  by 
John  McCarthy  and  proved  by  London  [London 721  We  will  refer  to  It  as  MCO  (for  modified 
CO).  T he  following  changes  were  made  to  make  Pascal  the  language  in  which  the  compiler  is 
written.  Compiler  CO  was  written  in  Rlisp,  but  the  Xivus  program  verifying  system,  which 
provided  the  machine  aid,  requires  programs  to  be  written  in  (slightly  extendi)  Pascal.  The 
translation  was  intended  to  perform  exactly  the  same  functions  as  the  original. 

1.  Conditional  expressions  were  removed.  Thus  A :=  IF  B THEN  C ELSE  D becomes 
IF  B THEN  A :-C  ELSE  A:-D. 

2.  All  gensym  function  calls  were  made  into  procedure  calls,  returning  the  value  as  a new 
variable  parameter.  The  reasons  for  this  are  fully  explained  In  Section  3.17. 

3.  All  lambda  expressions  were  expanded  fully.  Since  gensyms  were  previously  removed 
from  expressions,  this  expansion  did  not  result  in  any  single  gensym  calls  being 
duplicated  into  multiple  calls.  Again,  see  Section  3.17  for  further  explanation. 

4.  Add  sufficient  parentheses  on  single  argument  functions  because  Rlisp  allows  such 
parentheses  to  be  omitted. 

5.  Change  the  assignment  operator  from  «-  to  :i=  . 

6.  Declare  the  argument  FLG  as  Boolean  type  and  set  it  to  FALSE  and  TRUE  rather 
than  NIL  and  non-NlL. 

7.  Express  Lisp  list  notation  within  quoted  constants  as  CONSes  of  quoted  constants. 
Thus  ’(A  B)  becomes  CONS(’A,’B). 

8.  Expand  n argument  functions  into  nested  two  argument  functions.  Thus  LIST(A,B,C) 
becomes  CONS(A.CONS(B,CONS(C.NIL)))  and  APPEND(A,B,C)  becomes 
APPEND(A.APPEND(B,C)).  For  consistency,  even  one  or  two  argument  LIST  calls 
were  expanded  to  CONSes.  Some  simplifications  were  applied,  such  as 
APPEND(CONS(A,NIL),B)  becomes  CONS(A,B). 

9.  Appropriate  Pascal  function  headers  and  argument  lists  with  argument  types'  were 
added.  Also  BEGIN  END;  was  placed  around  function  bodies. 

10.  Infix  dot  is  translated  to  prefix  CONS. 

1 1 . EQ_  becomes  « . 
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METHODS  OF  PROOF 


The  following  changes  were  made  for  reasons  other  than  the  change  of  compiler 
language,  and  such  reasons  are  explained  under  each  change.  Again  these  changes  were 
intended  to  preserve  the  functions  of  the  original  compiler,  that  is,  not  change  the  output  of 
the  compiler,  except  for  the  item  involving  the  removal  of  an  optimization. 

1.  All  quotes  on  constants  were  replaced  by  the  letter  Q,.  Our  Pascal  parser  does  not 
recognize  any  form  of  alphanumeric  constants.  Similarly  NIL  becomes  Q,N1L  and,  for 
consistency,  T becomes  Q^T.  In  every  case  the  resulted  in  a new  identifier. 

2.  Various  expressions  were  grouped  as  separate  functions  for  clarity  in  reading  the  code 
and  ro  separate  logically  separate  tasks.  The  resulting  new  functions  are  RETRIEVE, 
RPOP,  and  ADDIDS. 

3.  LOCTABLE  was  used  as  a more  descriptive  name  than  VPR  for  the  variable  holding 
the  location  table. 

4.  CO  compiles  Boolean  expressions  nested  Inside  other  Booleans  differently  than  if  not  so 
nested.  This  optimization,  which  jumps  to  a given  label  according  to  the  Boolean  value 
instead  of  producing  a result  to  be  used  by  a conditional  Jump,  was  removed.  Thus 
COMPEXP  (the  function  that  compiles  an  expression)  is  called  and  a conditional  Jump 
statement  is  used  in  all  places  that  formerly  called  COMBOOL  (the  function  for 
optimized  compiling  of  a Boolean)  except  the  call  inside  COMPEXP.  Then 
COMBOOL  is  never  called  with  an  atomic  expression,  so  the  first  few  lines  of  code 
therein  may  be  discarded.  Similarly  the  final  ELSE  clause  of  COMBOOL  is  never 

• satisfied  and  may  be  omitted.  The  argument  FLG  is  always  FALSE  in  the  remaining 
c^lls  to  COMBOOL.  so  FLG  may  be  deleted  and  the  code  of  COMBOOL  simplified. 

5.  A new  variable  N was  introduced  to  shorten  the  form  resulting  from  expanded  lambda 
expressions. 

6.  Function  definitions  being  compiled  are  broken  into  their  syntactic  types  inside  compiler 
function  COMP  rather  than  before  the  call  to  COMP.  We  thought  the  compiler  was 
more  consistently  broken  into  tasks  this  way. 

7.  When  adding  assertions  to  MCO  it  was  found  necessary  to  refer  in  some  places  to  target 
code  already  produced.  But  often  some  target  code  was  an  argument  to  a call  to 
APPEND  in  a higher  level  routine,  and  thus  not  available  in  the  routine  being  asserted. 
The  solution  to  this  is  to  pass  a new  argument  (the  output  file,  or  briefly  "OUTFILE") 

.into  nearly  every  routine  in  order  to  have  newly  produced  target  code  appended  to  its 

end  (the  right-hand  end  if  viewed  as  a Lisp  list).  In  fact  we  want  a new  type  here  that 
resembles  Lisp  lists  except  that  it  has  only  one  operation  available  in  the  compiler  code, 
naVnely  CONSIng  onto  the  "wrong"  end.  We  will  call  this  type  FILE  and  allow 
RIGHTCONS  as  the  only  operation  in  the  code.  Assertions  may,  however,  dissect 
FILES.  Since  we  now  have  a variable  argument,  many  of  the  compiler  functions  must 
now  be  expressed  as  procedures.  Furthermore,  all  appends  of  code  are  now  incorrectly 
typed  and  unnecessary,  and  are  replaced  by  either  a caii  of  a procedure  that 
RIOtITCONSes  the  code  onto  the  OUTFILE,  or  else  an  assignment  to  OUTFILE  of  a 
RIGHTCONS  expression. 

8.  THEN  and  ELSE  clauses  are  reversed  and  the  IF  conditions  negated  to  allow  dropping 
null  ELSE  clauses  in  MKPUSH,  COMPLIS,  and  LOADAC. 

Although  this  compiler  was  reasonably  well  structured  in  its  original  form,  there  were 
many  changes  made  to  increase  its  understandability,  separation  of  tasks,  and  ease  of 
expressing  assertions.  More  could  have  been  made.  This  again  lends  credence  to  two  often 
expressed  program  proving  philosophies: 
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I.  1 he  better  situation  for  proving  programs  is  to  create  the  program  and  its  proof 
together,  not  to  add  on  proving  afterwards  as  was  done  here. 

2 Creating  a program  with  the  intention  of  adding  assertions  to  it  and  proving  It 
results  in  a more  understandable  program  (many  would  hold  this  true  even  if 
proving  is  not  carried  out). 


3.3  Inductive  Assertions 

1 he  inductive  assertion  method  of  proving  programs  was  introduced  by  Naur  [Naur66] 
and  Floyd  [Floyd67].  Naur  proposed  the  term  "snapshots"  for  what  Floyd  and  subsequent 
authors  term  assertions.  1 heir  method  involves  making  assertions  which  state  what  is  to  be 
true  each  time  execution  passes  through  given  points  in  a program.  Then  for  all  pairs  of 
assertiotis  connected  by  an  executable  path  of  statements,  it  is  proved  that  the  final  assertion 
must  be  true  after  execution  of  those  statements,  provided  the  initial  assertion  was  true  before. 
The  assertion  reached  at  the  conclusion  of  execution  (called  the  output  or  Exit  assertion)  is 
then  a specification  of  what  the  program  accomplishes.  We  here  assume  that  the  program 
does  indeed  terminate.  Proving  what  is  true  of  a program  making  this  assumption  of 
termination  is  called  a partial  correctness  proof.  Partial  correctness  by  using  inductive 
assertions  is  the  method  used  In  proving  MCO.  A good  review  of  the  method  and  other 
aspects  of  program  proving  is  Elspas  et  al,tElspas72l 


3.4  lloarc  Rule  Semantics 

Hoare  [Hoare69]  introduced  a method,  now  called  Hoare  proof  rules,  of  precisely 
defining  semantics  of  program  statements.  Appearing  In  Hoare  proof  rules  is  Hoare’s  notation 
which  means  that  if  assertion  P is  true  before  a program  part  S is  executed,  and  If 
execution  of  S terminates,  then  the  assertion  Q,  is  true  afterward.  Another  notation  used  by 
Hoare  is 

A,  B 


C 

which  is  a rule  of  inference  that  allows  us  to  deduce  C if  A and  B have  been  proved.  The 
Hoare  proof  rules,  written  in  these  notations,  then  take  the  form  of  axioms  or  rules  of 
inference  for  the  various  programming  constructs.  We  can  then  define  the  semantics  of  a 
statement  just  by  stating  what  can  be  proved  about  the  statement  using  its  Hoare  proof  rule. 
7'he  idea  that  specification  of  proof  techniques  provides  a programming  language  definition  is 
a main  point  of  Floyd's  paper  significantly  titled  "Assigning  meanings  to  programs"  [Floyd67]. 
The  Hoare  proof  rule  method  is  closely  related  to  the  inductive  assertion  nrethod,  and  is 
sometimes  referred  to  as  the  Floyd-Hoare  approach. 

Nearly  all  of  the  languages  Pascal  [Hoare73]  and  Euclid  [London77]  have  been  defined 
in  terms  of  such  rules.  For  programs  written  In  a language  so  described,  it  Is  a mechanical 
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procedure  to  transform  a program  containing  assertions  Into  a set  of  logical  theorems.  The 
theorems  are  called  verification  conditions,  and  the  mechanical  process  is  knovtm  as 
verification  condition  generation.  The  proof  of  these  verification  conditions  shows  that  the 
program  is  consistent  with  its  assertions,  which  are  taken  to  be  the  definition  of  correctness  for 
the  program. 

To  prove  a compiler  we  must  specify  the  semantics  of  not  only  the  language  in  which  it 
is  written,  but  also  the  source  language  and  the  target  language.  We  have  chosen  to  use 
Hoare  proof  rules  to  express  the  semantics  of  all  three  languages.  The  reasoning  behind  these 
three  choices  is  as  follows. 

We  wish  to  show  what  the  effects  are  of  executing  the  compiler.  Toward  this  end  we 
will  use  verification  condition  generation,  applying  Hoare  proof  rules  to  the  program 
statements  of  the  compiler.  Therefore  we  will  express  the  semantics  of  the  language  in  which 
the  compiler  is  written  by  means  of  Hoare  proof  rules,  making  those  semantics  directly  and 
easily  applicable  in  the  proof. 

What  we  wish  to  prove  about  a compiler  is  that  the  target  language  code  that  it 
produces  has  the  same  effect  as  the  source  language  does  (or  would  have  if  source  language 
were  directly  executed).  We  will  do  this  by  proving,  for  each  Hoare  proof  rule  P{S}<^  in  the 
source  language  (rules  of  inference  will  be  handled  similarly),  a Hoare  formula  in  target 
language  terms  roughly  of  the  form 

compi  lat  ion(P)  {cotnpi  lat  ion(S) } compi  lat  ion(Q)  . («) 

The  compilation  of  S is  simply  the  target  code  output  by  the  compiler  when  it  compiles  the 
source  language  code  S,  and  it  usually  consists  of  several  target  language  instructions.  The 
compilation  of  assertions  P and  means  the  result  of  translating  those  assertions  into  target 
language  terms  in  a manner  similar  to  what  the  compiler  does  to  source  language  code.  This 
translation  involves  changing  variable  names  into  locations  and  source  language  constants  into 
their  target  language  representation  while  retaining  the  function  names  and  structure  of  the 
assertion  expressions. 

A natural  way  to  express  source  language  semantics  is  with  Hoare  rules  when  we  use 
the  formulation  of  the  correctness  of  the  compiler  as  expressed  by  formula  (*)  above.  Any 
other  way  would,  of  course,  require  translation  to  Hoare  rule  form  for  use  In  the  formula  (*). 
Then  we  may  prove  the  target  language  formula  resulting  from  this  statement  of  correctness 
by  applying  the  Hoare  rules  (for  the  various  target  language  instructions  in  compilation(S))  to 
produce  a logical  theorem  which  we  will  then  prove.  Such  proof  for  every  possible  form  of 
compilation(S),  corre.^ponding  to  all  possible  syntactic  forms  of  S,  will  constitute  proof  of  the 
compiler.  1 his  points  to  the  use  of  hloare  rules  as  the  choice  to  express  the  semantics  of  the 
target  language  also.  We  will  give  the  target  language  Hoare  rules  used  to  prove  MCO  after 
giving  some  further  notation  involving  lists  and  substitutions. 


3.5  List  Notation 

We  believe  that  the  following  expression,  which  arises  in  the  proof  of  MCO,  is 
unreadable. 
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OUTFILE 

■=  APPEND  (OUTFILE*. 

APPEND (FCOMPANDOR(CDR (EXP),  M,  LI.  FALSE,  LOCTABLE) , 

CONS (CONS (’MOVE I. 

CONSd,  CONS  (CONS  (’(iUOTE,  CONS(’T.  ’NIL)), 

’NIL))), 

CONS(CONS(’JRST.  CONS(0,  CONS(L2,  ’NIL))), 

CONS (LI, 

CONS (CONS (’MOVE I. 

CONSd,  CONS(0,  ’NIL))), 

CONS(L2.  ’NIL))))))) 

Similar  expressions  appear  to  be  even  more  unreadable  under  any  of  the  conditions: 

I.  A longer  expression  is  used  (and  many  longer  ones  occur  in  the  proof  of  MCO), 

2-  Straight  Lisp  notation  for  functions  is  used  instead  of  arguments  separated  by 

commas,  or 

3.  Careful  indentation  is  not  used. 

Consequently  it  was  decided  that  a better  notation  must  be  used  to  make  more  readable 
the  assertions  about  lists.  A notation  was  developed  that  borrows  much  from  the  clisp  list 
construction  notation  [TeitelmanTb,  p.  23.10).  Functions  which  are  not  considered  intrinsic  to 
the  source  language  are  expressed  in  conventional  prefix  with  arguments  separated  by 
commas,  while  the  intrinsic  list  building  functions  use  the  clisp  style  of  notation.  Briefly,  that 
notation  builds  lists  that  are  begun  and  ended  with  angle  brackets:  < >.  Then  each  item  in 
the  brackets  Is  a member  of  that  list,  except  items  prefixed  by  ! are  lists  of  items  to  be  placed 
in  the  list. 

As  in  Lisp,  unquoted  items  are  to  be  evaluated  while  a prefixed  single  quote  means  not 
to  do  so.  Assume  F(X,Y)  returns  the  Lisp  value  ’(A  B).  Then  in  the  list  notation  used  here. 

< F(X,Y)  ’C  > means  ’((A  B)  C)  while  < !F(X.Y)  ’C  > means  ’(A  B C). 

When  Lisp  dotted  pairs  are  expressed,  we  will  use  < ’A  . ’B  > to  mean  ’(  A . B ). 

The  original  example  expression  in  this  section  nay  be  expressed  now  as: 

OUTFILE  - 

< • OUTFILE’ 

! FC0MPAND0R(CDR(EXP) , M.  LI  .FALSE, LOCTABLE) 

< ’MOVEI  I < ’QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  1 0 > 

L2 

> 


This  notation  was  used  in  proving  MCO  for  writing  assertions  which  were  then 
mechanically  translated  to  a form  like  the  original  example  in  this  section  for  machine  use. 
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S.6  Formalizing  Substitution 

The  Hoare  proof  rule  for  an  assignment  statement  is  (ignoring  such  complications  as 
function  calls  that  might  be  in  the  statement): 

a 

Q b { a b 1 Q 

The  vertical  line  notation  means  to  substitute  b for  all  free  occurrences  of  a in  the 
formula  A standard  caution  we  note  here  is  that  the  expression  b must  not  have  free  uses 
of  any  variables  that  become  bound  when  introduced  into  Q,.  Of  course,  renaming  of  the 
bound  variable  can  be  used  to  avoid  this  problem. 

When  applying  similar  Hoare  rules  during  a compiler  proof,  it  will  often  be  the  case 
that  will  be  symbolically  expressed.  In  that  case  we  will  not  be  able  to  write  the  result  of 
such  a substitution,  but  will  leave  it  In  the  vertical  line  notation  and  apply  further  Hoare  rules 
involving  substitutions  on  it.  Therefore  we  have  extended  this  notation  for  various  types  of 
multiple  substitutions. 

a c 
Q b d 

means  to  first  apply  the  a-b  substitution,  then  apply  the  c-d  substitution  to  that  result.  This  is 
sequential  substitution.  We  will  also  need  notation  for  simultaneous  substitution. 

a c 
Q b d 

means  to  simultaneously  substitute  b and  d for  a and  c,  respectively.  This  form  Is  often  used 
when  b contains  an  occurrence  of  c which  we  do  not  want  further  changed  to  d. 

a 

c 

Q b d 

means  to  first  apply  the  c-d  substitution  to  b,  then  use  that  result  for  all  free  occurrences  of  a 
in  q,. 

The  rules  which  we  will  present  regarding  substitutions  will  be  valid  only  when  the  item 
substituted  for  (for  instance  a in  the  above  examples)  is  an  identifier.  The  Item  substituted  in 
(that  is,  b in  the  first  example  above)  may,  however,  be  an  expression.  Occasionally  we  will 
have  a list  of  identifiers  which  we  will  represent  with  a single  name.  When  we  substitute  such 
a name,  we  will  use  a double  vertical  line  to  remind  us  that  it  is  a multiple  substitution  and 
the  single  substitution  rules  may  not  apply.  For  example,  if  x represents  the  list  <a  c e>  and  y 
the  list  <b  d f>,  then  the  form 
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II 


means 


Q 


a 

b 


c 

d 


e 

f 


Note  that  this  substitution  is  sequential.  When  we  have  occasion  to  express  a multiple 
simultaneous  substitution,  we  will  give  the  items  individual  names,  and  the  substitution  will 
look  something  like  this: 


Q 


al 

bl 


an 

bn 


We  will  occasionally  resort  to  parentheses  to  clear  up  any  unusual  orders  of  substitution 
not  covered  by  these  forms. 

In  order  to  simplify  expressions  involving  substitutions,  we  wish  to  express  the  condition 
that  a certain  name  could  not  possibly  exist  (as  a free  occurrence)  in  an  expression.  We  will 
u.se  the  notation  a -c  e to  describe  this  condition.  Again,  a must  be  atomic  while  e may  be  an 
expression.  Both  a and  e will  often  be  symbolically  expressed  In  a proof.  Precisely  what  is 
meant  by  a -•€  e is  that  for  all  possible  asserted  source  language  programs  from  which  e may 
be  derived,  a will  never  appear  as  a free  variable  name  In  e when  e is  written  out  fully  in 
source  or  target  language  variable  names  (no  code  or  assertions  left  symbolically  named).  For 
example,  if  e represents  a source  language  expression  and  a represents  a target  language 
register,  we  would  then  know  that  a e. 

To  assure  that  such  statements  involving  are  true,  we  assume  that  different  names 
are  used  for  source  language  objects  and  target  language  objects.  This  could  be  easily 
accomplished  by  renaming  to  unique  names  when  a conflict  would  occur,  but  no  actual 
conflicts  will  occur  in  the  proof  of  MCO  because  we  never  refer  to  source  variables  by  their 
names. 

There  are  four  distinct  classes  of  source  and  target  variables  in  a compiler  organized 
like  MCO.  Knowing  that  they  are  distinct  will  aid  us  In  simplifying  expressions  containing 
substitutions. 

1.  Source  program  variables.  We  will  usually  refer  to  these  by  expressions  which 
select  lists  of  variables  from  the  source  code  or  from  the  location  table  of  the 
compiler. 

2.  T arget  program  registers.  We  will  refer  to  the  registers  by  R I,  R2, ...  Rn. 

3.  Target  program  stack  pointer  P. 

4.  T arget  program  stack  locations.  We  will  always  refer  to  the  stack  locations  with 
an  array-like  subscript  notation,  calling  the  stack  memory  m.  For  example, 
m[P-fl].  Substitutions  involving  stack  locations  will  always  be  done  as  a 
replacement  of  the  entire  stack  m by  an  alpha  expression,  where  a(m,i,y)  means 
the  result  of  changing  the  ith  element  of  m to  the  value  y. 

The  fact  that  the  registers,  pointer,  and  stack  must  be  distinct  may  impose  restrictions  on 
certain  implementations  of  the  target  language  system.  For  example,  if  P,  the  stack  pointer  of 
MCO,  is  actually  implemented  as  register  fourteen,  then  proofs  of  correctness  are  not  valid  if 
register  fourteen  is  used  otherwise  in  executing  a target  program.  Since  registers  are  used  in 
the  target  code  produced  by  this  particular  compiler  to  hold  arguments  to  functions,  this 
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restriction  would  mean  that  user  programs  containing  over  thirteen  arguments  to  any  function 
could  not  be  correctly  compiled  and  run. 

The  rules  describing  simplifications  which  may  be  done,  henceforth  called  subrules,  are 
given  in  figure  3-1  at  the  end  of  this  section.  All  D’s,  a’s,  and  X’s,  numbered  or  not,  are 
atomic  names,  while  the  other  letters  may  be  expressions.  The  first  two  rules  express  the 
distinctness  of  the  four  variable  classes.  Subrule  3 allows  us  to  state  that  a variable  is  not-ln 
(-'c)  an  expression  if  it  is  not-in  any  of  the  expression’s  component  substitutions.  Subrule  4 
states  that  substituting  for  an  item  not  there  produces  no  change.  Subrules  5,  8,  9,  and  20 
show  us  the  equality  of  certain  forms  that  will  appear  in  the  proof  of  MCO.  Subrule  6 allows 
us  to  distribute  substitution  over  source  language  functions.  It  might  be  noted  that 
quantification  is  not  a function,  and  that  passing  functions  as  arguments  to  other  functions 
(not  allowed  in  the  source  language  of  MCO)  may  also  invalidate  subrule  6.  Neither  does  this 
subrule  apply  to  the  (non-source  language)  functions  such  as  substitution  that  are  applied  to 
assertions  rather  than  appearing  in  assertions  as  source  language  functions  do.  Subrule  7 
gives  exactly  the  conditions  under  which  we  may  change  the  order  of  multiple  substitutions. 
Subrule  10  tells  us  when  simultaneous  and  sequential  substitution  produce  the  same  result. 
Subrule  1 1 tells  us  that  ^ and  -c  are  the  same  for  names  (actually  we  only  give  the  Implication 
one  direction;  it  is  true  the  other  direction,  but  is  not  needed  here).  Subrule  12  states  that  an 
item  substituted  for  is  gone  (unless  it  is  put  back  in  by  that  substitution).  Subrule  IS  allows  us 
to  distribute  not-in  (-c)  over  source  language  functions  and  the  converse.  The  caveats  of 
subrule  6 apply  here  also.  Subrule  14  states  that  substituting  something  for  itself  results  in  no 
change. 

In  a few  places  in  the  proof  of  MCO,  we  have  simultaneous  substitutions  which  do  not 
satisfy  the  criteria  of  subrule  10,  and  so  may  not  be  treated  as  sequential.  Subrule  15  is  a 
rather  complex  rule  which  allows  us  to  sequentialize  such  substitutions  under  rather  specific 
conditions.  Those  conditions  are  that  a further  substitution  was  to  be  performed  after  the 
simultaneous  one  which  would  change  all  names  introduced  in  the  simultaneous  substitutions 
into  names  with  a certain  distinctness.  The  way  we  sequentialize  then  is  to  apply  the  further 
substitution  to  each  item  of  the  simultaneous  substitution. 

Subrules  16,  17,  18  and  19  define  how  substitution  and  --c  interact  with  universal 
quantification.  They  are  easily  understood  by  recalling  that  quantification  causes  variables 
not  to  be  free,  and  thus  affects  substitution  for  free  occurrences.  Note  that  in  subrule  18  we 
see  the  caution  mentioned  earlier  that  we  may  not  Introduce  a variable  during  substitution 
that  gets  bound.  Thus  H in  subrule  18  must  not  contain  X,  the  variable  being  bound  by  the 
quantifier. 

Subrule  21  is  simply  the  carrying  out  of  a substitution.  In  fact  use  of  this  subrule  will 
often  be  referred  to  as  doing  or  carrying  out  a substitution  rather  than  being  referred  to  by 
the  number  21.  Application  of  subrules  1 and  2 will  occur  frequently  in  the  proof  of  MCO. 
When  such  application  is  quite  obvious,  the  proof  will  omit  mention  of  those  rules.  Subrule  6 
will  often  be  referred  to  by  the  term  distribution  rather  than  by  number. 

A more  rigorous  justification  for  some  of  the  subrules,  along  with  the  flavor  of  how  the 
others  may  be  proved,  is  given  in  Section  A.  10. 
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FIGURE  S-1 


Subrule  I; 

a)  Rl's  source  language  expressions 

b)  P -c  source  language  expressions 

c)  m -t  source  language  expressions 

d)  source  variables  -•€  non-source  expressions 

Subrule  2: 

a)  source  variable  k Ri 

b)  source  variable  k P 

c)  source  variable  •<  m 

d)  Ri  P 

e)  Ri  K m 

f)  P .«  m 


Subrule  3; 

a)  D-*€  GaD“»c  hi  ■♦D*’C  C 


Dl 

HI 


IDI 

b)  D -c  C A D -c  Hi  (for  I 5 i ^ n)  •«  D -•€  G |hI 
Subrule  i: 

Id 

D --C  G -*  G |h  > G 
Subrule  5: 


D2 

H2 


Subrule  6 (distribution  of  substitutions); 


Dl 

Dl 

D2 

D2  G -♦  G 

HI 

H2  • G 

HI 

D 

D ... 

(f  Gl  . 

. . Gn) 

H > (f  GI 

H ...  Gn 

D 

H ) 


Dn 

Hn 


where  f is  any  source  language  function. 
Subrule  7: 


|D1 

|D2 

|D2 

|DI 

a)  Dl 

“•e 

H2 

A D2 

-^c  HI  A DI  e D2 

•»  G |hI 

|H2  - C 

|h2 

|hi 

b)  Di 

-e 

HJ 

A Di 

•>  DJ  (for  1 S i 

i n,  1 

s J s n, 

i e 

J)  -* 

Dl 

. iDn 

G 

HI 

. . , 

|Hn 

can  be  permuted 

to  any 

order. 
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Subrule  8: 


Dl 

D2 

D2 

Dl  x D2  A Dl 

H2  G HI 

H2  - 

C H2 

Subrule  9: 

Dl 

Dl 

|D2 

D2 

Dl  - D2  ^ G HI 

|h2  c c hi 

H2 

Subrule  10: 

Dl  D2 

|DI 

D2 

a)  D2  HI 

C HI  H2  > 

c |hi 

H2 

Dl 

HI 


D2 

H2 


Dl  . 

. . Dn 

Dl  .. 

. Dn 

b)  Dl  "t  Hj  (for  1 ^ J < i < n)  -*  C HI  . 

. . Hn  > G 

HI  .. 

. Hn 

Subrule  II: 

Dl  X D2  -»  Dl  -^c  D2 


Subrule  12: 

D 

D -c  H -»  D -e  C H 

Subrule  13  (distribution  of  --c): 

a)  D --c  (f  Cl  ...  Gn)  -»  D -•€  Gl  (for  1 < I < n) 

b)  D -«  Gi  (for  I < i < n)  -»  D -c  (f  Gl  . . . Gn) 

where  f is  any  source  language  function. 

Subrule  14: 

D 

G D - G 
Subrule  15: 


if  all  atoms  in  the  b's  appear  in  v at  least  once,  a*s  are  distinct  from  each  other,  and  all  items 
in  w are  distinct  from  those  in  v. 

Subrule  16; 

a)  X -^c  Gl  -*  VX  (Gl  -»  G2)  - Gl  -«  VX  (G2) 

b)  X -€  Cl  VX  (Gl  A C2)  - Cl  A VX  (02) 

c)  X -^c  C •*  VX  (C)  - G 
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Subrule  17; 

a)  VX  (VX  (G))  - VX  (C) 

b)  VXI,...,Xn  (VXI Xm  (G))  - VXI Xmax(ii,n)  (G) 


Subrule  18: 

a)  X .>  D A X H -•  VX  (G) 


H - VX  (G 


D 

H ) 


b)  XI  •<  DJ  A XI  -’€  Hj  (for  I s 1 s n,  I S J s m)  -♦ 


VXI Xn  (G) 

Subrule  9: 

X 

VX  (G)  H - VX  (G) 
Subrule  20: 

Dl 

|D2 

D2  -c  H2  G HI  |H2 
Subrule  21: 


DI 

HI 


Dm 

Hffl  - VXI Xn  (G 


DI  . 
HI  . 


D2 

|DI  |D2 
H2  - G HI  H2 


Dm 
Hm  ) 


D 


D 

H . H 


$.7  Hoare  Rules  Defining  Target  Language 

We  here  give  the  Hoare  rules  defining  the  semantics  of  each  of  the  target  language 
statement  types  that  appear  in  the  output  of  the  compiler  MCO. 


MOVEI: 


Q 


Ri 

y { < 'MOVEI  I y > } Q 


This  rule  can  be  viewed  as  an  assignment  statement  Hoare  rule.  For  example,  the 
instruction  < 'MOVEI  I0>  can  be  viewed  as  the  assignment  statement  RI:-0.  The 
following  standard  Hoare  rule  for  assignment  would  then  apply. 

|RI 

d |0  ( RI  0 } q 

MOVE: 
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Q 


Ri 

m[P+j]  { < 


'MOVE  1 J ’P  > } Q 


This  rule  Is  similar  to  the  MOVEI  rule  except  that  we  must  use  J as  a pointer  to  the 
memory  location  that  holds  the  value  assigned  to  the  register  Ri  rather  than  using  J Itself  as 
the  value.  In  addition  we  have  an  index  register  being  used,  which  in  MCO  always  happens 
to  be  the  stack  pointer  P. 


JRST. 

assertion(l)  { < 'JRST  0 I > } Q 

The  JRST  rule  may  be  understood  by  viewing  JRST  as  a go  to.  The  rule  states  that 
regardless  of  what  is  so  at  the  point  in  the  program  located  after  the  go  to  I,  the  assertion  at 
the  label  I must  be  true  immediately  before.  It  should  be  noted  that  the  zero  in  the  statement 
is  optional. 


JUMPE; 

(Ri>:0  ■*  assertion(l))  a (Ri«0  Q)  { < 'JUMPE  i I > } Q 

JUMPE  is  the  equivalent  of  the  higher  level  language  statement: 

IF  RieO  THEN  GO  TO  I.  The  standard  Hoare  rules  for  such  a compound  statement  are: 

PaR{A}Q.  PA-^R-»q 


P { IF  R THEN  A } Q 

and 

assertion(l)  { GO  TO  I } Q 
We  may  combine  these  to  get  the  Hoare  rule 
P A R -»  assertion(l),  P a -•R  •»  Q 


P { IF  R THEN  CO  TO  I } Q 

The  rule  for  JUMPE  is  the  axiomatic  form  of  this  rule  of  inference. 

JUMPN: 

(RixO  assertion(l))  a (Ri-O  -*  Q)  { < 'JUMPN  i I > } Q 

The  JUMPN  instruction  Is  the  same  as  JUMPE  except  that  the  register  Is  checked  to  be 
non-zero  rather  than  zero. 


CALL: 
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NIL 

al  ... 

an 

Entry(f) 

0 

Rl  ... 

Rn  A 

NIL 

h 

(Exlt(f) 

0 

<f  al  .. 

. an> 

VR2 RN(f)  (Q) 


Rl 

<f  Rl  ...  Rn>  ) 


{ < ’CALL  n <’E  f>  > ) Q 


where  N(f)  is  the  maximum  number  of  registers  modified  during  execution  of  function  f,  h Is 
the  designation  in  Exit(f)  for  the  result  of  the  function  call  to  f,  and  al  ...  an  are  the  formal 
arguments  of  f.  This  rule  depends  on  the  function  linkage  conventions  used  by  compiled 
code.  What  the  rule  says  is  that  any  compiled  function  has  the  same  Entry  and  Exit 
conditions  as  the  corresponding  source  function,  except  the  formal  arguments  are  replaced  by 
the  registers  and  the  constant  NIL  is  replaced  by  0.  Further,  register  Rl  will  contain  the 
function  result,  and  the  other  registers  used  are  quantified  as  variable  parameters  to  the 
function.  Q^uantification  is  one  of  the  standard  ways  of  handling  variable  parameters,  as 
shown  by  the  following  higher  level  language  Hoare  rule  for  a call  of  procedure  p with 
variable  parameter  a. 


P(a)  { p(a)  ) R(a) 


P(a)  A Va  (R(a)  -*  S(a))  { CALL  p(a)  } S(a) 

I'he  upper  line  of  this  rule  of  inference  is  the  proof  of  the  procedure  body  with  respect 
to  the  Entry  P and  Exit  R.  Note  that  the  quantification  makes  the  variable  a appearing  In  S 
and  the  Exit  into  a different  variable  than  the  a in  the  Entry.  The  quantification  in  our 
CALL  rule  does  not  include  the  Exit  because  the  source  language  does  not  have  variable 
parameters.  Hence  all  references  to  parameters  in  the  Exit  refer  to  the  values  of  the 
parameters  before  the  CALL  is  performed. 

It  might  be  questioned  how  the  target  code  can  treat  the  parameters  of  a function  as  If 
they  were  variable  parameters  when  the  source  language  does  not  allow  parameters  to  be 
changed.  Before  the  call  target  language  statement  is  executed,  the  target  code  has  placed  the 
values  of  the  parameters  into  temporary  locations  (the  registers).  The  target  code  may  then 
procede  to  overwrite  the  registers  during  evaluation  of  the  function  Just  as  If  the  registers  were 
variable  parameters.  This  has  no  effect  on  the  original  actual  parameters  in  the  calling 
function. 

SUB; 

IP 

Q |P-n  { < ’SUB  ’P  <’C  0 0 n n>  > } ({ 

The  strange  appearance  of  this  instruction  is  due  to  the  use  of  P as  two  half  words. 
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I'hc  rif  ht  half  is  actually  the  stack  pointer,  while  the  left  is  used  for  a hardware  check  on  the 
slack  size.  We  assume  the  stack  size  is  within  bounds  in  the  proof  of  MCO  (otherwise  the 
program  would  not  terminate  normally),  and  we  completely  ignore  the  left  half.  Thus  the 
instruction  may  be  viewed  simply  as  the  assignment  P ;•=  P-n. 


POPJ: 


|h 

Nil. 

1 a 1 

Exit  (f) 

RI 

0 

RI  ’ ... 

an 

Rn’  { < ’POPJ  ’P  > 


} Q 


where  f is  the  function  in  which  we  find  the  POPJ  instruction  and  h and  the  a’s  are  as 
previously.  T he  POPJ  instruction  may  be  viewed  as  a return  statement,  which  explains  this 
rule’s  resemblance  to  a return  Hoare  rule.  As  with  the  CALL  statement,  the  Itxations  in  the 
registers  of  the  arguments  and  function  result  must  be  taken  into  account,  and  the  source 
constant  NIL  must  be  translated  to  0. 


PUSH: 


Q 


ni 

a(m,P,Ri ) 


P 

PH  { < ’PUSH  ’P  I > 


Q 


1 liis  instruction  accomplishes  the  same  as  the  assignments  P :«=  P+l;  m[P]  Ri.  The 
rule  is  seen  to  be  the  composition  of  the  standard  assignment  rules  for  these  two  statements. 
As  before,  the  notation  a(m,P,Ri)  means  the  array  which  is  the  same  as  m except  m[P]  has 
been  replaced  by  Ri. 

During  the  course  of  generating  verification  conditions  we  will  assume  the  standard 
consequence  and  composition  Hoare  rules  that  apply  to  essentially  all  target  languages. 


P -*  Q.  Q { A } R 


P { A } R 
P { A } Q,  Q ^ R 
P { A } R 

P { A ) Q,  Q I B } R 


P { A B } R 


where  A B signifies  that  the  instructions  A and  B are  concatenated. 
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3.8  Two  Part  Proof 

We  divide  the  proof  of  a compiler  into  two  distinct  parts.  The  first  proves  what  target 
code  the  compiler  produces  for  each  source  language  syntactic  type.  The  second  proves  that 
the  target  language  produced  has  the  same  effect  as  the  source  language.  We  find  this 
division  convenient  because  it  divides  the  proof  into  parts  that  are  easier  to  handle.  It  seems 
to  be  a natural  division  in  that  the  two  proof  parts  are  independent  of  each  other  in  the  sense 
that  the  second  part  uses  the  result  of  the  first  part,  but  essentially  none  of  the  same  methods 
or  proof  parts  need  be  repeated. 

1 he  semantics  of  the  language  in  which  the  compiler  is  written  are  needed  in  the  first 
proof  part  to  transform  into  verification  conditions  the  assertions  stating  what  the  compiler 
produces.  The  first  proof  part,  however,  does  not  deal  with  the  semantics  of  the  source  and 
target  language  (although  the  syntax  of  each  is  used). 

The  second  proof  part  involves  showing  that  the  Hoare  formula  of  the  form 
compilation(P)  {compilation(S)}  compilation(Q^  is  satisfied.  The  value  of  compilation(S)  has 
been  established  by  the  part  one  proof.  To  prove  the  target  language  Hoare  formula,  we  will 
apply  target  language  Hoare  rules  to  it  to  produce  a logical  theorem  which  we  will  then  prove. 
7'hus  the  second  proof  part  must  use  the  semantics  of  the  target  and  source  language,  but  does 
not  Involve  code  In  the  language  in  which  the  compiler  is  written,  nor  its  semantics. 


3.9  Division  of  Labor 

Our  belief  Is  that  program  proving  should  be  done  interactively.  We  believe  that  the 
human  should  be  involved  for  the  difficult  and  Insightful  portions  of  a proof,  while  the 
machine  should  handle  routine  and  repetitive  parts. 

As  a result  of  the  division  of  the  proof  into  two  parts  as  described  above,  the  first  part 
is  within  the  capabilities  of  present  program  proving  systems.  In  a manner  consistent  with  our 
philosophy,  the  first  part  of  the  proof  of  MCO  was  in  fact  accomplished  on  the  Xivus 
interactive  program  proving  system. 

The  second  proof  part  of  the  MCO  proof  was  tedious  and  could  have  used  machine 
assistance.  Unfortunately,  It  must  be  carri^  out  at  a level  above  that  of  existing  program 
proving  systems.  The  target  code  produced  by  the  compiler  is  expressed  in  terms  of  the  source 
code  being  compiled  and  the  state  of  the  compiler  (symbol  table,  etc.).  Thus  we  cannot  use 
ordinary  verification  condition  generators  for  part  two,  because  they  must  operate  on  actual 
code,  not  an  expression  representing  the  code.  The  consequence  of  this  and  other  features  (a 
list  of  such  features  Is  in  Section  7.2)  missing  from  present  proving  systems  is  that  part  two 
of  the  MCO  proof  was  carried  out  as  a hand  proof  rather  than  a machine-assisted  proof.  The 
proof  of  the  McCarthy-PaInter  compiler  was  simple  enough  to  coerce  the  Xivus  system 
through  the  part  two  proof  in  order  to  show  that  the  methods  employed  were  mechanizable. 
In  Section  7.2  we  suggest  ways  that  a mechanized  system  could  be  built  to  allow  more 
complex  part  two  proofs  to  be  done  interactively. 
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3.10  Brief  Description  of  Xivus  Verification  System 

The  Xivus  program  verification  (proving)  system  has  the  following  major  components: 

1.  A text  editor 

2.  A parser  for  asserted  programs  written  In  the  language  Pascal  [Wlrth71,Jensen74] 

3.  A verification  condition  generator 

-4.  A logical  and  arithmetic  simplifier 

5.  A substitution  package 

6.  An  interactive  theorem  prover 

7.  An  interactive  top  level  program  to  direct  and  aid  the  proof 

The  system  is  the  one  described  by  Good,  London,  and  Bledsoe  [Good75]  with  further 
evolutionary  changes.  The  verification  condition  generator  is  essentially  that  of  Igarashi, 
Luck  ham,  and  London  [Igarashi75],  and  the  theorem  prover  was  derived  from  the  Bledsoe- 
Bruell  prover  [Bledsoe73]. 

Typically  an  asserted  program  is  entered  into  the  system  via  the  editor  and  is  then 
parsed,  and  its  verification  conditions  are  generated.  The  simplifier  and  substituter  usually 
prove  many  of  the  verification  conditions  and  shorten  the  remaining  ones  to  more  manageable 
size.  Those  remaining  ones  are  then  submitted  to  the  theorem  prover.  Upon  discovery  of  a 
false  or  unprovable  verification  condition,  the  problem  must  be  located  in  either  the  program 
code  or  assertions,  and  the  steps  repeated.  The  system  has  some  abilities  to  recognize  portions 
of  the  proof  which  remain  unchanged  and  thus  need  not  be  reproved. 


3.11  Structural  Induction 

Structural  induction  has  been  used  In  practically  all  the  proofs  of  operationally 
expressed  compilers,  including  that  of  MCO.  Structural  induction  may  be  stated  as:  If  we  have 
proved  that  a program  works  for  each  possible  structural  type  of  data  while  assuming  the 
program  works  on  the  smaller  pieces  of  data,  then  we  have  proved  the  program  works.  The 
basis  for  the  induction  lies  in  the  proof  of  the  structural  types  which  do  not  contain  other 
types.  Structural  induction  is  well  suited  to  proving  compilers  because  source  languages  are 
usually  defined  In  terms  of  structural  or  syntactic  types.  Further,  source  languages  are  usually 
inductively  defined,  since  some  syntactic  types  may  contain  arbitrarily  complex  other  syntactic 
types. 


3.12  Optimizing  Compilers 

In  treating  separately  each  syntactic  type  of  the  source  language,  we  assume  that  the 
results  of  compiling  are  truly  separate.  That  is,  the  target  code  produced  from  compiling 
syntactic  type  X must  not  depend  on  whether  that  occurrence  of  X contains,  for  instance,  type 
Y.  Similarly  the  target  code  from  type  X should  not  depend  on  whether  that  occurrence  of  X 
is  contained  in,  for  instance,  type  Z.  This  independence  may  in  fact  be  given  as  a definition 
of  a non-optimizing  compiler. 

If  the  target  code  produced  depends  on  what  is  contained  in  the  syntactic  type,  It  can  be 
treated  by  the  techniques  here.  For  example,  we  could  have  a syntactic  type  X compile  Into 
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different  code  when  its  arguments  are  constants.  This  could  be  treated  simply  by  defining  a 
new  syntactic  type  X-with-constants,  and  applying  the  techniques  used  here. 

Optimizations  in  which  the  target  code  produced  depends  upon  the  context  of  its  source 
would  not  be  proved  by  the  techniques  demonstrated  here,  but  as  extensions  to  these 
techniques  or  as  a separate  proof  of  equivalence  to  the  non-optimized  version.  More  will  be 
said  about  this  in  Section  l.i. 


3.13  Axiomatic  Description  of  Lisp 

MCO  deals  extensively  with  list  structures,  for  recognizing  source  language  syntactic 
types,  for  extracting  parts  from  the  source  language,  for  building  and  using  the  location  table, 
and  for  building  the  pieces  of  target  language.  The  compiler  was  written  in  a modified  Pascal 
which  allows  functions  to  return  values  of  complex  type.  Dealing  with  lists  was  accomplished 
by  functions  with  the  same  names  and  properties  as  basic  Lisp  functions,  except  having  fixed 
numbers  of  arguments.  1 hose  Lisp-like  functions  were  never  supplied  as  Pascal  code,  nor 
wcie  they  given  assertions.  Therefore  no  properties  of  these  functions  appeared  in  the 
resulting  verification  conditions.  A standard  set  of  properties  was  entered  directly  into  the 
theorem  prover  as  rewrite  rules  to  be  used  in  the  proofs  of  the  verification  conditions.  Those 
rules  may  be  found  in  Section  A. 3.  In  using  this  technique,  we  recognize  that  such  rewrite 
rules  can  define  functions  as  well  as  their  code  could,  and  further  that  in  the  environment  of 
the  proof  of  MCO  definition  by  rewrite  rules  is  more  easily  used  than  the  code  would  be.  This 
technique  can  be  used  for  functions  which  are  to  be  considered  intrinsic  to  the  source 
language.  In  the  verification  strategy  used  in  Alphard  [Wulf76],  this  technique  may  be  used 
on  both  intrinsic  functions,  such  as  those  describing  sequences,  and  on  functions  for  which 
code  is  given  by  the  programmer.  In  the  latter  case,  though,  the  programmer  must  establish 
that  the  code  is  consistent  with  the  axiomatic  specifications. 


3.M  Axiomatic  Definition  of  Source  Language  Syntax 

Wo  u.se  in  the  compiler  assertions  a set  of  functions  (predicates)  which  tell  us  if  any 
given  source  language  expression  is  a certain  syntactic  type.  For  instance,  in  the  proof  of 
MCO,  the  function  ISAND(S)  is  true  if  and  only  if  source  language  expression  S is  an  AND 
type.  I'hc  definitions  of  such  functions  are  not  supplied  as  code,  but  as  axioms  to  be  used  by 
the  theorem  prover  in  proving  the  verification  conditions.  For  example,  the  following  axioms 
are  needed  for  the  AND  syntactic  type.  I'he  dot  after  the  X indicate  universal  quantification 
of  the  X over  the  entire  axiom.  The  prefixed  single  quote,  as  in  Lisp,  indicates  a constant. 

1.  ISAND(X.)  CAR(X.)  - ’AND 

2.  ISAND(X.)  -♦  NOT  NULL(X.) 

A NOT  (X.  - ’T) 

A NOT  NUMBER? (X.) 

A NOT  ATOM(X.) 
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Note  that  the  first  axiom  says  what  the  type  is.  and  the  second  axiom  says  what  the  type 
is  not.  In  fact  the  four  conclusions  of  the  second  axiom  represent  exactly  the  four  syntactic 
types  which  are  checked  by  the  compiler  before  checking  the  AND  type.  Most  of  the  "what 
the  type  is  not"  properties  could  be  derived  from  properties  of  Lisp  functions  Involved,  and 
therefore  we  could  have  used  Lisp  properties  in  the  compiler  proof  instead.  For  example,  our 
knowledge  of  Lisp  would  allow  us  to  conclude  that  any  form  that  has  a CAR  (such  as  AND) 
is  not  any  of  the  atomic  forms  NIL,  T,  a number,  or  a general  atom.  However,  we  felt  that 
the  properties  as  expressed  above  are  easily  written  by  referring  to  the  source  language  syntax 
definition,  and  that  the  properties  as  given  are  directly  applicable  in  a compiler  proof. 

In  order  to  express  the  fact  that  the  arguments  to  an  AND  type  consist  of  a list  of  valid 
expressions,  we  also  need  the  axiom: 

3.  ISEXPRESS10N(X.)  a CAR(X.)  - ’AND  -*  ISLISTOFEXP(CDR(X. ) ) 

There  are  also  some  axioms  expressing  that  a LISTOFEXP  expression  consists  of  legal 
expressions.  To  accommodate  the  recursion  on  the  number  of  arguments  of  AND,  we  need 
one  further  axiom: 

4.  ISLISTOFEXP(Z.)  -»  ISAND(CONS(’AND,Z.)) 

Similar  sets  of  axioms  are  needed  for  the  other  syntactic  types  of  the  source  language  of 
MCO.  A list  of  all  such  axioms  is  given  in  Section  A.2. 

S.I5  Axiomatic  Stack  Proof 

I'he  proof  of  the  compiler  MCO  in  Hoare  rule  terms  makes  certain  assumptions  about 
the  run-time  stack  during  the  execution  of  certain  strings  of  target  code.  These  assumptions 
are  explained  precisely  in  Chapter  4,  but  may  be  roughly  described  by:  the  contents  of  the 
stack  remain  unchanged  during  execution  of  a string  t of  target  code.  That  is,  any  items 
added  to  the  stack  by  executing  t must  be  removed,  but  none  of  the  original  items  in  the  stack 
may  be  removed.  When  this  property  is  true  of  a string  t,  we  denote  it  by  stackok(t).  To 
discharge  the  proof  of  these  assumptions  we  wish  to  describe  (with  axioms)  whether  certain 
strings  of  target  language  statements  will,  when  executed,  modify  the  stack,  and  if  so,  how  the 
stack  will  change.  The  axioms  will  then  be  applied  to  the  theorems  to  be  proved  to  produce 
subgoals,  to  which  further  axioms  are  applied  until  all  goals  and  subgoals  are  proved.  The 
following  is  a brief  description  of  the  stack  axioms  used  in  the  proof  of  MCO. 

First  we  wish  to  give  axioms  describing  how  objects  are  pushed  onto  and  popped  off  of 
the  stack.  The  popping  of  the  stack  will  be  done  with  a single  SUB  Instruction  (even  if 
several  elements  are  to  be  popped),  while  the  pushing  onto  the  stack  will  be  done  by  several 
PUSH  instructions  spread  throughout  the  target  code.  In  order  to  pop  off  the  stack  exactly 
the  number  of  Items  earlier  pushed  onto  it,  we  need  a function  that  tells  us  how  many  PUSH 
instructions  are  contained  in  certain  strings  of  target  code.  So  we  define  containspushes(t,n) 
by: 

Cl:  containspushes(«’PUSH  ...  »,l) 
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C2:  containspushes («a  ...»,0)  when  a x ’PUSH 

C3;  containspushes  (t  I ,nl)  a containspushes  (t2,n2)  -» 
containspushes (<  !tl  !t2  >,  nl  -t-  n2  ) 

where  tl  and  t2  are  lists  of  statements.  Note  that  < !tl  !t2  > is  the  list  containing  all  the 
statements  in  the  list  tl  followed  by  all  the  statements  in  the  list  t2.  Cl  and  C2  give  the 
containspushes  property  for  strings  that  contain  a single  instruction,  and  C3  allows  us  to 
combine  strings  to  arbitrary  numbers  of  target  instructions. 

With  the  containspushes  property  we  may  then  axiomatize  the  stackok.  property  we 
described  above.  For  reasons  of  function  linkage  explained  more  fully  in  Chapter  4 we  also 
define  stackokreturns,  which  describes  a string  of  target  code  which  has  the  stackok  property 
with  the  exception  of  an  additional  concluding  POPJ  statement.  POPJ  returns  to  a kKation 
which  is  taken  from  the  stack. 

The  following  axioms  then  describe  stackok  and  stackokreturns.  Explanations  of  the 
axioms  are  found  below. 

SI:  stackok(t)  -♦  s tackokreturns (<  !t  ’<POPJ  P») 

S2r  stackok(tl)  A containspushes (t2,n)  •* 

stackok(<  !t2  !tl  <’SUB  ’P  <’C  0 0 n n»>) 

S3:  stackok(tl)  a stackok(t2)  -•  stackok(<  It  I !t2  >) 

S4:  stackok(«’CALL  ...») 

S5:  stackok(«’MOVE  ...») 

S6:  stackok(«’MOVEI  ...») 

S7:  stackok(<  I >) 

where  I is  a label 

S8:  stackok<<  < ’JRST  0 I > It  1 >) 

where  t is  a list  of  zero  or  more  statements  and  I is  a label 

S9:  stackok(tl)  a stackok(<  It2  I !tl  >)  -> 
stackok(<  < ’JUMPx  I I > It2  I It!  >) 

where  ’JUMPx  means  ’JUMPE  or  ’JUMPN,  and  tl  and  t2  are  lists  of  zero  or  more 
statements. 

S I is  the  definition  of  stackokreturns  in  terms  of  stackok.  In  S2  we  state  that  n pushes 
to  the  stack  must  be  balanced  by  a later  pop  of  n items,  with  the  intervening  Instructions 
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leaving  the  stackok.  Concatenation  of  stackok  strings  is  allowed  by  S3.  We  list  certain  target 
instructions  (including  labels)  that  are  stackok  in  S4  through  S7,  In  S8  we  state  that  a string 
of  statements  is  stackok  if  we  execute  an  unconditional  jump  over  all  of  them.  S9  requires 
that  both  paths  which  a conditional  Jump  might  take  must  be  stackok  for  the  entire  assembly 
of  statements  to  be  stackok. 

It  might  be  surmised  that  S9  could  have  been  more  simply  stated  as: 

S9S:  stackok(t2)  -♦  stackok(<  < ’JUMPx  i I > !t2  I >) 

While  this  simpler  axiom  is  true  and  would  be  useful  In  our  proofs,  it  is  not  strong  enough  to 
prove  some  cases  that  arise.  The  shortcomings  of  S9S  become  obvious  when  it  is  realized  that 
stackok(t2)  may  be  undefined.  For  example,  t2  can  contain  a Jump  to  a point  in  tl.  Then 
none  of  our  axioms  will  allow  us  to  define  stackok(t2),  but  use  of  axiom  S8  (possibly  in 
combination  with  S3  and  others)  will  allow  us  to  define  stackok  of  the  combination  of  tl,  the 
label  I,  and  t2. 

It  ntay  be  noted  that  the  simple  forms  of  Jumps  output  by  the  compiler  are  always 
forward  Jumps,  and  hence  there  are  no  loops.  Also,  all  execution  paths  that  separate  at  a 
conditional  Jump  eventually  rejoin  below.  It  is  this  simplicity  of  target  code  that  allows 
comparatively  simple  axioms  defining  stackok  with  respect  to  Jump  statements.  We  do  not 
need  to  resort  to  finding  execution  paths  or  loops. 

The  fact  that  we  will  complete  the  stackok  proofs  of  MCO  with  only  these  stackok 
axioms  will  ensure  that,  as  we  have  casually  assured  the  reader,  there  are  no  backward  jumps 
in  the  target  code  produced  by  MCO.  If  there  were  backward  Jumps  (at  least  In  ctwie  that  Is 
required  to  satisfy  the  stackok  property),  no  axioms  would  apply  to  such  code,  and  we  would 
be  unable  to  continue  a stackok  proof.  Only  S8  and  S9  have  Jumps  In  them,  and  both  have 
the  forward  direction  of  the  Jump  built  into  them.  Similar  reasoning  using  axiom  S9  ensures 
that  paths  which  split  must  rejoin  below. 


3.16  I'he  F Functions 

In  several  places  in  the  assertions  we  must  speak  Just  of  the  new  target  code  produced 
by  a specific  call  to  a compiler  procedure.  The  arrangement  of  the  compiler  MCO  does  not 
re.sult  In  any  compiler  variable  holding  Just  the  code  resulting  from  a given  routine,  so  we  are 
not  able  to  use  a compiler  variable  name  for  such  references.  Instead  we  pass  an  output  file  as 
a variable  parameter  to  many  of  the  procedures  of  the  compiler,  to  which  those  procedures 
add  statements  of  target  code.  Rather  than  try  to  extract  from  this  output  file  the  target  code 
which  was  added  to  the  file  since  some  earlier  time,  we  will  arbitrarily  give  that  added  code  a 
symbolic  name.  1 he  name  "F  function"  comes  about  because  we  have  prefixed  the  letter  F to 
associated  procedure  names  to  create  a unique  new  name  by  which  new  target  code  may  be 
referenced. 

1 hese  F functions  are  an  extension  of  Hoare  and  Wirth’s  method  for  variable 
parameters  [Hoare73].  In  their  method,  the  existence  of  a function  is  assumed  In  order  to 
exprc.ss  the  final  value  of  a variable  parameter  to  a procedure  as  a function  of  the  initial 
values  of  the  parameters.  These  functions  are  given  arbitrary  names  for  use  In  program 
proving,  and,  except  in  very  simply  analyzable  situations,  these  functions  (or  at  least  certain 
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important  properties  of  them)  must  be  supplied  by  the  prover  of  the  program.  The  F 
functions  differ  from  the  Hoare  and  Wirth  functions  in  that  ours  refer  only  to  the  part  added 
to  the  parameter  rather  than  the  entire  final  value. 

I'he  F functions  are  actually  formally  defined  by  the  use  of  the  assume  function  In  the 
theorem  prover  when  proving  the  resulting  verification  conditions.  In  other  words,  we  use  a 
new  function  name  in  the  assertions  about  the  compiler  without  supplying  code  or  assertions 
describing  that  function.  Then  in  the  proof  of  the  resulting  verification  conditions  we  may 
assume  certain  properties  about  those  functions,  and  that  assumption  constitutes  the  definition 
of  those  functions. 

An  example  of  this  technique  occurs  in  proving  the  case  involving  the  AND  syntactic 
type  of  the  compiler  MCO.  The  following  assertion  is  needed  in  procedure  COMPEXP  in 
order  to  carry  out  the  proof  of  other  assertions. 

OUTFILE  - 
< ! OUTFILE’ 

! FCOMPEXP(EXP,M,LOCTABLE) 

> 


where  the  suffixed  single  quote  mark  indicates  Initial  value  of  a variable  parameter. 

One  clause  of  one  of  the  verification  conditions  resulting  from  that  assertion  requires 
that  we  prove  the  following  conclusion  under  the  conditions  of  this  subcase  (syntactic  type  is 
AND,  more  than  zero  arguments). 

< ! OUTFILE’ 

! FCOMPEXP (EXP . M , LOCTABLE) 

> 


< ! OUTFILE’ 

! FCOMPANDOR(CDR(EXP)  ,M,L1  .FALSE,LOCTABLE) 

< ’MOVEI  I < ’QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 

> 

Recalling  that  FCOMPEXP  has  not  yet  been  formally  defined,  we  see  that  assuming  this 
subgoal  in  the  theorem  prover  simply  defines  FCOMPEXP  for  this  subcase  as: 
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< ! FCOMPANDOR(CDR(EXP),l!.LI, FALSE. LOCTABLE) 

< ’MOVEI  I < ’QUOTE  'T  > > 

< ‘JRST  0 L2  > 

LI 

< 'MOVEI  I 0 > 

L2 


The  full  set  of  definitions  for  FCOMPEXP  and  other  F functions  for  all  cases  in  the 
compiler  MCO  is  given  in  Section  A.I.  We  assume  the  value  of  FCOMPEXP  (or  any  other 
F function)  only  once  for  each  case  or  subcase.  The  mutually  exclusive  definitions  of  source 
language  syntactic  types  assures  us  that  no  two  definitions  apply  for  any  given  piece  of  source 
language,  thus  eliminating  the  possibility  of  this  method  producing  an  inconsistent  definition 
of  an  F function. 


3.17  The  Gensym  Problem 

Gcnsym  is  a Lisp  function  of  no  arguments  that  produces  a unique  identifier  every  time 
it  is  called.  The  compiler  MCO  uses  it  to  produce  unique  labels  in  the  target  code. 
Unfortunately  it  is  not  a mathematical  function  because  it  has  the  side  effect  of  causing  the 
next  call  to  it  to  produce  a different  value.  It  is  the  opposite  of  the  aliasing  problem  (multiple 
names  for  one  value),  because  gensym  has  one  name  that  may  represent  multiple  values.  It 
complicates  proving  terribly  if  we  cannot  depend  on  F(X)  = F(X),  i.e.,  F-F. 

One  solution  to  this  would  be  to  write  a gensym  procedure  complete  with  a variable 
parameter  that  is  passed  everywhere  throughout  the  program  and  incremented  at  every 
gensym  call.  Its  ever-changing  value  would  present  a base  from  which  to  construct  an  ever- 
changing  Identifier.  We  think  this  solution  is  too  low-level  and  messy  for  easy  proof.  Instead 
we  chose  to  use  the  verification  condition  generator  mechanism  for  variable  parameters  to 
procedures,  which  assigns  unique  identifiers  to  represent  them  after  each  procedure  call  within 
a program  unit.  Therefore  gensym  was  made  a procedure  with  its  former  function  value 
passed  back  as  a variable  parameter  Instead.  Then  each  call  to  gensym  within  a program  unit 
got  its  parameter  renamed  uniquely  and  consistently  in  the  verification  conditions  produced. 

Another  problem  associated  with  gensym  is  the  violation  of  the  assumption  that 
procedure  Exit  assertions  are  expressed  in  terms  of  only  the  procedure  parameters  (and 
possibly  their  initial  values).  A reference  in  an  Exit  assertion  to  a variable  existing  only 
inside  the  procedure  (i.e.,  a local  variable,  not  a parameter)  could  clash  with  another  variable 
of  the  same  name  in  any  other  procedure  calling  it.  For  this  reason,  the  Exit  assertion  must  be 
considered  to  be  outside  the  scope  of  locally  declared  variables.  However,  it  is  often  necessary 
in  the  compiler  MCO  to  refer  In  an  Exit  assertion  to  the  variable  produced  by  a gensym  call 
within  a procedure.  For  example,  the  Exit  assertion  of  COMPEXP  (the  main  compiling 
routine  for  expressions)  for  the  case  of  a COND  syntactic  type  with  no  arguments  is; 
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ISCOND(EXP)  A NULL (CDR (EXP))  -» 

OIH'FILE  - 
< • OUTFILE’ 

L5 

> 

The  suffixed  single  quote  mark  indicates  initial  value  of  a variable  parameter.  The  L5  is  a 
compiler  variable  into  which  gensym  returns  a unique  name  for  use  as  a target  code  label. 

When  COMPEXP  is  called  by  a routine,  say  X,  the  verification  condition  being 
generated  in  X will  have  added  to  it  the  properties  from  COMPEXP’s  Exit  assertion.  Thus 
L5  from  within  COMPEXP  could  clash  with  an  L5  in  X;  tha»  is,  we  could  have  L5  referring 
to  two  different  objects  in  the  same  verification  condition.  This  would  invalidate  the  proof, 
assuming  we  actually  use  those  clashing  uses  of  L5  in  the  proof  of  the  verification  condition. 
However,  if  we  proved  a verification  condition,  for  instance,  by  means  of  its  hypothesis  being 
false,  the  proof  would  remain  valid  even  If  clashing  variable  uses  occurred  In  the  conclusion 
of  that  verification  condition,  since  we  did  not  actually  use  the  conclusion. 

T o prevent  clashes  from  even  occurring  in  most  verification  conditions,  we  have  named 
uniquely  the  arguments  to  gensym  on  all  the  calls  throughout  the  compiler.  Thus,  In  the 
above  example  from  COMPEXP.  no  other  routine  will  contain  a variable  name  that  could 
clash  with  gensym  argument  l.S  introduced  when  COMPEXP  is  called.  Even  with  the 
unique  argument  naming,  two  possible  sources  of  clash  remain  that  could  Invalidate  a proof. 
One  is  the  case  of  a recursive  ta'!  lor  example  if  COMPEXP  calls  Itself,  the  L5  from  the 
lower  call  will  be  introduced  into  i ' e verification  condition  that  may  already  refer  to  L5  in  the 
top  level  instance  of  executing  COMPEXP  The  other  possibility  is  that  a verification 
condition  could  involve  two  calls  to  the  same  routine,  and  therefore  introduce  the  gensym 
argument  variable  twice  to  ntean  two  different  scopes  of  that  variable. 

1 he  part  one  proofs  were  examined  for  cases  where  a gensym  variable  name  was 
intrcKluced  into  a verification  condition  by  a procedure  call  so  that  a gensym  variable  clashed 
with  a variable  name  already  used  to  mean  something  else.  In  every  case  where  this  did 
occur,  the  proofs  were  carried  out  without  making  any  use  of  the  clauses  in  which  the  clashing 
reference  occurred  Thus  the  proofs  may  be  considered  valid.  The  problem  of  searching  for 
such  clashes  was  simplified  greatly  by  the  separate  asserting  and  proving  of  each  syntactic 
type.  Mad  assertions  for  all  cases  been  entered  at  once,  a great  many  more  clashes  would  have 
occurred,  but  all  in  clauses  of  the  verification  conditions  that  pertained  to  othe:  cases  and  thus 
would  not  be  used  in  the  proof  of  that  case. 

An  alternative  method  of  avoiding  such  clashes  would  have  been  to  pass  all  gensym 
arguments  as  variable  parameters  to  every  procedure  that  either  used  them  or  called  a 
procedure  that  used  them  throughout  MCO.  Then  the  variable  parameter  mechanism  of  the 
verification  condition  generator  would  rename  them  to  prevent  clashes.  Unfortunately  It 
would  also  greatly  increase  the  size  of  the  verification  conditions  resulting.  It  appears  that  the 
examination  to  determine  that  clashes  did  not  occur  was  indeed  easier  than  it  would  have 
been  to  wade  through  more  complex  verification  conditions  at  nearly  all  stages  of  the  part  one 
proof. 

As  mentioned  above,  all  references  in  an  Exit  assertion  to  local  variables  must  normally 
be  eliminated.  A local  reference  is  eliminated  by  using  an  expression  of  the  input  parameters 
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that  is  equal  to  the  local  variable.  But  the  gensym  variable  must  be  referred  to  in  an  Exit 
a.s.sertion  by  its  name,  not  as  a function  of  the  procedure’s  input  parameters,  because  the 
gensym  variable,  by  the  design  of  gensym,  bears  no  relationship  to  previously  known  values. 
That  is,  the  only  requirement  on  gensym  is  that  its  output  be  different  from  previous  output. 
This  is  why  the  gensym  variables  are  different  than  other  local  variables  in  that  they  cannot 
be  expressed  as  functions  of  the  input  parameters  and  therefore  must  appear  in  the  Exit 
assertion. 
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4.1  Derivation  and  Statement  of  the  Main  Result 

The  function  of  a compiler  is  to  translate  a program  from  a language  for  which  we  do 
not  have  a suitable  means  of  executing  it  (the  source  language)  into  another  language  for 
which  we  do  (the  target  language).  Therefore  the  statement  of  correctness  of  a compiler  is  that 
for  all  source  programs  the  target  program  produced  by  the  compiler  produces  the  same  results 
when  run  as  the  source  program  would  if  it  were  run.  We  will  precisely  define  the  effects  of 
executing  a statement  by  the  use  of  Hoare  type  proof  rules.  Thus  our  first  attempt  at  the 
statement  of  correctness  for  compiler  MCO  would  be: 

For  all  programs  A ( (P{A}(i)  -♦  (P {COMPILATION (A) }(i)  ) 

We  will  always  assume  that  the  source  program  A is  a legally  executable  program. 
Preventing  compile-time  and  run-time  errors  is  a problem  which  we  will  not  address.  We 
simply  wish  to  show  that  if  a source  program  can  be  proved  to  accomplish  a given  effect  by 
use  of  the  Hoare  formalism,  then  its  compilation  will  accomplish  the  same  effect. 

In  order  to  tailor  our  statement  of  correctness  to  structural  Induction,  we  will  express 
that  statement  in  terms  of  the  source  language  structural  units,  expressions  and  statements, 
rather  than  in  program  terms.  The  statement  of  correctness  becomes: 

For  all  statements  or  expressions  S ( (P{S}(i)  -♦  (P{COMPILATION(S)}(i)  ) 

In  the  proof  of  compiler  MCO  we  will  first  treat  the  one  syntactic  type  of  our  source 
language  that  is  a statement,  the  function  definition.  For  it  we  have  in  the  source  language 

Entry(f)  (S)  Exlt(f) 

where  S is  of  the  form  < ’DE  f <al  ...  an>  exp  >. 

The  Entry  and  Exit  represent  the  only  P and  Q,  for  which  P{S}Q_  holds  when  S Is  a 
function  definition.  Thus  the  attempt  at  a statement  of  correctness  is; 

ISFUNCTIONDEF(S)  (Entry(f)  {COMPILATION(S)}  Exlt(f)) 

This  statement  of  correctness  suffers  from  having  made  the  assumption  that  the 
assci'tlons  may  be  stated  in  the  same  terms  for  both  source  and  target  languages.  This  is 
typically  not  the  case.  The  source  language  will  usually  use  symbolic  names  for  variables, 
while  the  target  language  will  usually  use  memory  locations,  or  at  least  indirect  memory 
references,  such  as  locations  relative  to  a stack  pointer. 

A similar  problem  occurs  for  constants  in  the  source  language.  For  compiler  MCO,  only 
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one  constant  is  rcpiescnted  differently  in  the  source  and  target  languages.  So  we  must 
substitute  0 (the  target  representation)  for  NIL  (the  source  representation)  in  assertions. 

1 he  function  calling  convention  of  a compiler  generally  defines  a fixed  group  of 
locations  in  the  target  machine  in  which  we  will  expect  to  find  the  arguments  to  a called 
function.  For  MCO,  the  arguments  will  be  passed  in  the  registers,  which  we  designate  as  Rl, 
R2,  ...  , Rn.  Similarly  the  result  of  a function  will  be  passed  back  in  register  one  (Rl). 

1 hus  for  a function  definition  statement  S,  we  will  state  the  correctness  of  the  compiler 
MCO  as  (using  the  substitution  notation  of  Section  3.6): 


En  t r y ( f ) 


NIL  al 
0 Rl’ 


an 

Rn’  {COMPILATION(S)} 


h 

NIL 

a 1 ... 

Ex i t ( f ) 

Rl 

0 

Rl’  ... 

where  S is  of  form  < ’DE  f <al  ...  an>  exp  >,  the  identifier  h is  the  designation  used  in  Exit(f) 
for  the  function  value  returned  by  the  function  f,  and  Ri’  is  the  initial  value  of  register  Ri. 
Since  the  initial  and  present  values  of  the  registers  are  the  same  at  the  time  of  the  Entry 
assertion,  the  single  quotes  signifying  initial  values  are  unnecessary  for  the  Entry.  However, 
they  wilt  make  clearer  the  fact  that  we  are  referring  to  initial  values  during  the  proof,  so  they 
will  be  used  here.  It  might  be  noted  that  h must  be  an  identifier,  not  an  expression,  because 
the  substitution  formalization  is  valid  only  when  substituting  for  identifiers. 

Note  that  since  the  registers  (Ri’s)  are  distinct  from  the  formal  arguments  (aj’s),  we  may 
make  the  substitution  of  registers  for  arguments  either  sequentially  or  simultaneously.  We 
choose  sequentially  because  more  of  the  substitution  simplifications  apply  to  sequential  forms. 
Note  also  that  Entry  and  Exit  can  be  functions  of  only  formal  arguments;  Exit  may  also 
include  the  returned  function  value.  There  are  no  global  or  free  variables  Inherited  in  this 
source  language.  I'herefore  we  have  renamed  into  target  language  all  variable  names  in  Entry 
and  Exit. 

Now  the  question  arises,  by  what  name  in  the  MCO  compiler  code  Is  the  compilation  of 
S returned?  1 he  answer  is  almost  COMP(S,OUTFILE).  What  COMP(S,OUTFILE)  actually 
returns  is  the  initial  value  of  the  sequential  output  file  OUTFILE  with  the  compilation  of  S 
appended  to  the  end.  Therefore  we  will  define  a new  name  to  describe  exactly  the 
compilation  of  S,  rather  than  almost  what  we  want.  FCOMP(S)  is  (by  definition)  the  target 
code  added  to  OU  TFILE  to  obtain  the  returned  value  of  COMP(S,OUTFlLE).  Because' of 
the  way  we  wrote  the  Exit  assertion  for  COMP,  it  is  obvious  what  portion  of  that  assertion  is 
represented  by  FCOMP. 

Having  arrived  at  the  statement  of  correctness  for  MCO  in  the  case  of  the  function 
definition,  we  now  turn  to  the  case  of  expre.ssions.  The  source  language  Hoare  rule  for 
expressions  must  account  for  the  Entry  and  Exit  conditions  of  the  functions  contained  in  the 
expre.ssion,  and  the  way  in  which  they  are  nested.  We  will  define  below  a precondition  Pre 
and  a postcondition  Post  to  accomplish  this.  The  source  language  Hoare  rule  is: 


Prc(S)  A (Post(S)  -*  Q)  {S}  Q 

where  S is  of  form  <f  bl  ...  bn>,  f has  been  defined  (by  the  user  or  the  basic  Lisp  language 
definition)  with  formal  arguments  al an,  and 
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Pre(S)  «=  Pre(bl)  a ...  a Pre(bn)  a 

a I ...  an 

(Post(bl)  A ...  A Post  (bn)  -♦  Entry(f)  bl  ...  bn  ) , 

and 

a I ...  an  h 

Post(S)  ••  Post(bl)  A ...  A Post  (bn)  a Exit(f)  bl  . . . bn  S 

Note  that  the  substitutions  of  actual  for  formal  arguments  must  be  simultaneous.  The 
substitution  for  the  result  h may  be  done  after  the  argument  substitution  (rather  than 
simultaneously)  only  by  assuming  h Is  distinct  from  the  a’s  and  b’s.  This  can  be  made  so  by 
renaming  h for  purposes  of  this  proof. 

The  Pie(S)  then  represents  the  collection  of  all  necessary  preconditions  nested  in  the 
expression,  while  Post(S)  is  the  collection  of  all  results  in  the  expression.  There  will  be 
shortened  forms  of  these  definitions  of  Pre  and  Post  in  cases  (AND,  OR,  COND)  where  not 
all  arguments  are  necessarily  evaluated.  Shortening  of  these  forms  may  also  occur  for  certain 
functions  whose  Entry  or  Exit  is  TRUE,  allowing  logical  simplification.  A full  accounting  of 
Pre  and  Post  for  all  cases  of  MCO  is  given  in  figure  4-1. 

It  should  be  noted  that  this  form  of  Hoare  rule  nests  the  Entry  and  Exit  conditions  in 
exactly  the  order  that  they  are  nested  in  the  expression  S.  This  allows  the  source  language  to 
contain  dependencies  between  the  Entries  and  Exits.  The  dependencies  might  be  of  the  kind 
where  the  Entry  condition  of  a function  Is  implied  by  the  Exit  condition  of  its  argument.  For 
example,  an  Exit  condition  of  the  Lisp  function  CONS  is  that  its  result  is  not  atomic,  which 
would  satisfy  the  Entry  condition  of  CAR  (that  its  argument  not  be  atomic)  in  the  expression 
CAR(CONS(X,Y)).  It  is  clear  by  the  duplication  of  the  original  nesting  of  the  source 
expression  into  the  Pre  and  Post  conditions  that  this  Hoare  rule  formulation  will  properly 
account  for  these  dependencies. 

Our  first  attempt  at  a statement  of  correctness  for  expressions  is; 

ISEXPRESSION(S)  ^ (Pre(S)  a (Post(S)  -♦  (i)  {COMPILATION(S)}  (^  ) 

Rut  now  we  must  return  to  the  old  problem  of  renaming  variables  to  target  language 
locations.  An  expression  may  use  any  variable  names  declared  by  a containing  function 
definition  or  lambda  expression. 

1 hus  we  will  define  a list  v of  variable  names  declared  and  a list  w of  memory 
locations.  The  list  v will  have  all  newly  declared  variables  added  to  its  beginning,  and  the 
corresponding  location  assigned  will  be  added  to  the  beginning  of  w simultaneously.  Similarly 
items  will  be  removed  when  we  leave  their  scope.  Thus  this  pair  of  lists,  dynamic  during 
compilation  of  a program,  will  have  in  v the  name  of  any  variable  that  may  be  used  in  an 
expression,  and  will  have  in  the  corresponding  position  in  w the  target  language  location 
assigned.  If  a variable  is  declared  more  than  once,  the  most  recent  one  is  valid,  so  we  must 
use  the  first  time  that  variable  appears  in  v. 

Fortunately  the  concept  of  such  a pair  of  lists  has  already  been  invented  and  is  called  a 
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syntbol  table.  MCO,  like  most  compilers,  uses  a symbol  table,  so  we  will  be  able  to  extract  v 
and  w from  LOCTABLE,  the  variable  within  the  MCO’s  data  that  holds  the  current  symbol 
table.  Because  MCO  uses  locations  relative  to  a stack  pointer,  we  will  have  to  make  a relative- 
to-absolute  adjustment  to  put  actual  locations  in  w.  But  other  than  that,  extracting  v and  w 
from  LOCTABLE  will  be  simply  the  extraction  of  a pair  of  lists  from  a list  of  pairs. 

LOCTABLE  holds  Lisp  dotted  pairs  of  associated  variable  names  and  locations  in  the 

form: 

< <NAMF,I  . L0C1>  ...  <NAMEr  . L0Cr>  > 


Thus  V and  w will  be  given  by: 
V - < NAMEl  ...  NAMEr  > 


w - < m[M+P+LOCI]  . . . m[M+P+LOCr]  > 


where  m designates  an  array  of  memory  used  as  a stack.  When  variables  are  declared,  MCO 
allocates  space  for  them  in  m beginning  at  m[P+l],  since  P is  a pointer  to  the  last  used 
location  in  m.  The  LOCi  locations  are  relative  to  the  run-time  value  of  P at  entry  to  the 
present  function.  Since  P will  be  updated  as  execution  proceeds,  we  will  keep  track  of  how  far 
different  the  present  value  of  P is  from  the  initial  (function  entry)  value  with  a variable  called 
M.  It  is  minus  the  number  of  stack  locations  locally  used,  so  that  M-rP  is  the  initial  value  of 
P. 

1 hus  the  statement  of  correctness  of  MCO  requires  the  substitution  of  locations  from  w 
for  the  variable  names  of  v in  source  assertions  to  get  target  assertions.  Since  the  notation  we 
will  use  for  locations  will  not  be  the  same  as  that  used  for  source  variables,  the  substitution 
may  be  simultaneous  or  sequential.  Again  we  will  choose  sequential  to  ease  the  simplification 
of  substitutions.  T he  statement  of  correctness  for  expressions  is  now: 

ISEXPRESSION(S) 


V 

NIL 

V 

NIL 

V 

(Pre(S) 

w 

0 A (Post(S) 

w 

0 - (i 

w 

{COMPILATION(S)}  Q 


V NIL 
w 0 ) 


However,  this  ignores  the  fact  that  expression  values  are  accessed  by  the  expression 
itself  in  source  language,  but  by  use  of  register  one  (R I)  in  target  language.  Thus  we  must 
substitute  R I for  alt  occurrences  in  Q,  of  the  expression  in  question.  We  do  not  have  a 
notation  for  substitution  for  an  expression.  1'he  substitution  notation  and  Its  simplification 
rules  used  here  are  only  valid  for  substitution  for  an  atom.  But  we  will  avoid  the  need  to 
express  or  simplify  such  substitution  by  letting  the  variable  T (not  to  be  confused  with  the 
Lisp  constant  T)  represent  the  result  of  substituting  Rl  for  the  expression  In  question. 
Further,  we  will  assume  that  T has  already  had  w substituted  for  v and  0 for  NIL.  Thus  T is 
the  target  language  equivalent  of  the  source  language  assertion  Q. 
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Then  before  the  curly  brackets  of  the  Hoare  formula  we  can  say  that  T holds  if  we 
substitute  in  it  the  expression  (translated  to  target  language  terms)  for  Rl.  Thus  we  get: 

ISEXPRESSION(S)  -» 


Rl 

V 

NIL 

V 

NIL 

V 

(Pre(S) 

w 

0 A (Post(S) 

w 

0 -*  T 

S 

w 

{COMP  H AT  I ON  (S)}  T ) 


Note  the  strong  resemblance  of  this  formula  to  the  Hoare  rule  that  would  result  from 
assigning 


S 


V Nil, 
w 0 


(the  target  language  equivalent  of  S)  to  a variable  called  RI,  where  the  substituted  S is  a 
function  call  with  a precondition  and  postcondition. 

One  further  problem  remains.  The  action  in  target  language  of  an  expression 
evaluation  involving  a function  call  resembles  a procedure  call  with  variable  arguments  rather 
than  a function  call  with  constant  arguments.  That  is,  registers  R2  ...  Rm  may  have  their 
values  changed  during  evaluation  of  the  function  call.  Of  course  Rl  will  be  changed  also,  but 
its  value  will  be  set  to  the  value  returned  by  the  expression.  Note  that  m is  not  necessarily  the 
same  as  n,  the  number  of  arguments,  because  nested  function  calls  may  wipe  out  more  or  fewer 
than  n registers. 

1 hus  in  our  proof  of  MCO  we  must  not  allow  the  code  to  use  the  previous  value  of  the 
regi.sters  R2  ...  Rm.  We  ensure  this  with  the  same  technique  used  for  variable  procedure 
arguments,  that  of  quantifying  the  registers  that  are  subject  to  new  values  during  function 
execution. 

We  define  N(S)  as  the  number  of  such  registers.  It  is  precisely  defined  for  various 
syntactic  types  in  figure  ■1-2.  Note  that  N(S)  < 2 means  no  registers  (other  than  R 1)  are 
subject  to  change  and  no  quantification  is  then  intended  by  the  notation  VR2,...,RN(S). 

T o avoid  confusion  between  the  argument  of  N and  the  expression  quantified,  we  will 
always  leave  a space  before  the  expression  quantified.  For  example,  VR2,...,RN(S)  (T)  means 
VR2 Rm  (T)  where  m N(S). 

In  a manner  similar  to  FCOMP,  we  define  a new  function 
FCOMPEXP(S,M,LOCTABLE)  to  be  the  target  code  added  to  OUTFILE  to  obtain  the 
returned  value  of  COMPEXP(S,M,LOCTABLE,OUTFILE).  The  final  version  of  the  Hoare 
formulas  we  must  prove  for  the  correctness  of  MCO  is  then: 
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I)  ISFUNCTIONDEF(S)  . 


(Fnt  ry(f ) 


Nil 

0 


a I 
Rl ' 


an 

Rn'  {ECOMP(S)} 


h 

NIL 

al  ... 

Fxit(f) 

Rl 

0 

Rl’  ... 

where  S is  of  form  < ’DE  f <al  ...  an>  exp  >,  h Is  the  designation  used  in  Exit(f)  for  the 
function  value  returned  by  f,  and  Ri’  is  the  initial  value  of  register  Rl. 


2)  ISEXPRESSION(S)  -♦ 


V 

NIL 

V 

NIL 

(Pre(S) 

w 

0 A (Post (S) 

w 

0 -» 

VR2 RN(S)  (1 


Rl 


S 


V NIL 

w 0 ))  {FCOMPF.XP(S.M,LOCTABLE)}  T ) 


where  l.OCTARl.E  is  of  form  < < NAMEI  . LOCI  > ...  < NAMEr  . LOCr  > >, 
V » < NAMEI  ...  NAMEr  >,  w - < m[M4P+LOCI] ..  m[M4P4LOCr]  >,  m is  the  array  of 
memory  u-sed  as  a stack,  M is  minus  the  size  of  that  portion  of  the  stack,  containing  locally 
declared  variables,  P is  the  stack  pointer,  Pre(S)  is  the  collection  of  preconditions  nested  in  S 
(this  is  precisely  defined  in  figure  4-1),  Post(S)  is  the  collection  of  postconditions  nested  in  S 
(.sec  figure  4 I),  and  N(S)  is  the  maximum  number  of  registers  that  are  modified  during 
execution  of  the  compilation  of  S (this  is  precisely  defined  in  figure  4-2) 

It  might  be  pointed  out  that  for  MCO  these  formulations  of  correctness  (conditions  1) 
and  2))  require  the  target  language  program  to  terminate  if  the  source  language  program  did. 
1 he  reason  for  this  is  that  l.uckham  and  Suzuki  have  shown  [l.uckham771  that  Hoare  rule 
proving  systems  are  adequate  to  show  termination  (with  the  use  of  recursion  and  loop 
counters),  coupled  with  the  fact  that  MCO  docs  not  introduce  into  the  target  program  any 
sources  of  nontermination  that  were  not  in  the  source  program.  That  Is.  there  are  no  loops  or 
recursions  in  the  compilation  of  any  syntactic  types  except  the  same  recursive  calls  that  appear 
in  the  source  language.  1 his  assumes  that  all  target  language  instructions  involved  always 
terminate.  It  is  often  true  of  compilers  that  no  new  sources  of  nontermination  are  introduced, 
except  when  source  language  operations  are  implemented  by  loops  or  recursion  using  simpler 
operations-  for  example,  implementing  exponentiation  by  a loop  containing  multiplication.  In 
such  a case,  preservation  of  termination  between  the  source  and  target  language  programs 
would  require  a proof  of  the  termination  of  any  new  (not  appearing  In  the  source  language 
program)  loops  or  recursion  that  the  compiler  could  produce  in  a target  program. 

1'here  are  some  assumptions  made  about  the  stack  in  this  formulation  which  we  must 
identify  and  justify.  No  distinction  has  been  made  in  the  Hoare  formulas  between  the  values 
of  P,  the  stack  pointer,  before  and  after  the  curly  brackets  containing  target  code.  Similarly 
the  distinction  Is  not  made  about  m,  the  stack  array.  This  Implies  that  m has  the  same  value 
before  execution  of  this  target  code  as  it  has  after,  and  similarly  that  P has  the  same  value 
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before  and  after.  But;  in  fact,  both  P and  m are  subject  to  change.  So  in  order  to  use  the 
given  Hoare  formulas  as  statements  of  correctness,  we  must  show  that  the  values  of  P and  m 
after  evaluation  of  each  expression  have  been  restored  to  the  values  before.  Because  all 
values  within  the  stack  m that  we  may  subsequently  use  lie  somewhere  in  the  segment  of  m 
from  m[l]  to  m[P],  it  is  sufficient  to  show  that  this  segment  of  m remains  unchanged  rather 
than  all  of  m.  It  is  often  the  case  during  execution  of  an  expression  evaluation  that  items  will 
be  added  to  the  stack  m beyond  m[P]  and  that  P will  be  raised.  But  we  must  show  that  at  the 
end  of  that  expression  evaluation  the  value  in  P is  returned  to  its  former  value,  thus 
removing  (or  more  precisely  abandoning)  items  beyond  m[P],  and  show  that  the  segment  of  m 
up  to  m[P]  was  not  changed. 

1 he  only  two  in.structions  produced  by  MCO  that  set  stack  locations  are  PUSH  and 
CAM..  1 hey  set  only  the  location  after  the  one  to  which  P is  pointing  before  the  Instruction 
is  executed.  1 hus  we  can  see  that  a sufficient  condition  to  accept  the  previously  given 
.statement  of  compiler  correctness  is  to  show  that  P never  drops  below  its  beginning  value 
during  evaluation  of  an  expression,  and  that  it  returns  with  exactly  the  initial  value.  We 
denote  as  stackok(z)  this  property  of  target  code  z. 

One  further  property  of  the  compiled  code  is  required.  The  generally  accepted 
under. standing  of  functions,  as  well  as  the  Hoare  formalization  of  functions,  implies  that  a 
function,  when  its  execution  is  completed,  always  returns  to  the  point  In  the  code  from  which 
if  was  called.  1 he  target  code  must  explicitly  perform  this  returning  as  a go  to  type  of 
statement.  Since  the  returning  is  to  be  accomplished  by  leaving  the  return  point  on  the  stack, 
we  will  combine  its  proof  with  the  stackok  proof.  The  mechanism  used  by  target  code  is  that 
a CAM.  instruction  not  only  Jumps  to  the  code  for  the  called  instruction,  but  pushes  onto  the 
stack  a location  to  which  the  called  function  is  to  return.  Thus  every  compiled  function  must 
end  its  execution  with  a POPJ  instruction,  which  removes  one  item  from  the  stack  and  jumps 
to  where  that  item  points.  Target  code  which  is  stackok  except  for  a concluding  POPJ 
instruction  will  be  called  stackokreturns. 

These  properties  stackok  and  stackokreturns  are  defined  by  a set  of  axioms.  The  proof 
of  MCO  then  requires  that  the  following  two  conditions  be  proved  for  each  syntactic  type: 

3)  ISFUNCTIONDEF(S)  s tackokreturns (FCOMP(S) ) 

4)  ISF,XPRESSION(S)  -*  stackok(FCOMPEXP(S,M,LOCTABLE)) 

The  conditions  I)  through  4)  then  constitute  the  statement  of  correctness  of  the  compiler 
MCO  and  will  be  proved  for  all  syntactic  types  of  source  code. 


FIGURE  4-1 

Definitions  of  Pre(S)  and  Post(S)  for  the  syntactic  cases  of  S: 


case  of  NIL: 
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Pie(S)  - TRUE 
Post(S)  = TRUE 

ISNUMRER  case: 

Pic(S)  - TRUE 
Post(S)  - TRUE 

ISIDLN  I IHER  ease: 

Pre(S)  = TRUE 
Post(S)  = TRUE 

AND  (no  arguments)  subcase: 

Prc(S)  - TRUE 

Post(S)  - (S  = < 'QUOTE  ’T>) 

where  <’CtUOTE  ’T>  is  the  Lisp  constant  for  TRUE. 

AND  (n  > 0 arguments)  subcase: 

Pre(  < 'AND  bl  b2  . . . bn  > ) = 

Pi  e(bl)  A (bl  NIL  -+  Pre(  < 'AND  b2  . . . bn  > ) 

Post(  < 'AND  bl  b2  ...  bn  > ) - 
(bl  - NIL  S - NIL)  A 

(bl  ^ NIL  -»  S - < 'AND  b2  ...  bn  >)  A 

Pos t (bl ) A 

(bl  NIL  ->  Post(  < 'AND  b2  ...  bn  > )) 

OR  (no  arguments)  subcase: 

Pre(S)  = TRUE 
Post(S)  - (S  --  NIL) 

OR  (n  > 0 arguments)  subcase; 

Prc(  < 'OR  bl  b2  . . . bn  > ) » 

Prc(bl)  A (bl  - NIL  ->  Pre(  < 'OR  b2  . . . bn  > ) 

Post (<  'OR  bl  b2  . . . bn  > ) = 

(bl  ^ NIL  -*  S - < 'QUOTE  'T>)  a 

(bl  - NIL  -»  S = < 'OR  b2  . . . bn  >)  A 

Post(bl)  A 

(bl  = NIL  Post(  < 'OR  b2  ...  bn  > )) 

NOT  case: 

Pre(  < 'NOT  bl  > ) = Pre(bl) 

Post(  < 'NOT  bl  > ) - (bl  - NIL  -*  S - <'QU0TE  'T>)  a 

(bl  K NIL  -»  S • NIL)  A 
Post(bl) 


KKSUl.lS 


37 


CONI)  (no  arpumcnts)  subcase: 

Prc(S)  - TRUE 

Post(S)  - (S  = UNDEFINED) 

C'.ONl)  (n  > 0 arf.unicnts)  subcase: 

Pic(S)  - Pie(cl)  A (cl  ^ NIL  -♦  Pre(dl))  a 

(cl  - NIL  ->  Pre(<  ’COND  <c2  d2>  ...  <cn  dn>  >)) 

Post(S)  = Post(cl)  A (cl  x NIL  -*  Post(dl)  a S - dl)  a 

(cl  ^ NIL  Post(<  'COND  <c2  d2>  ...  <cn  dn>  >)  a 
S «=  < ’COND  <c2  d2>  . . . <cn  dn>  >) 

where  S is  of  form  < 'COND  <cl  dl>  <c2  d2>  ...  <cn  dn>  >. 

(.)L)0  I F.  case: 

Pre(S)  - TRUE 
Post(S)  - TRUE 

ca.se  of  a function  call: 


Prc(S)  = Pre(bl)  a ... 

A Pre(bn)  a 

al  . . 

. . an 

(Post (bl)  A . . 

. A Post (bn) 

-»  Entry(f) 

bl  . 

. . bn  ) 

al  . 

. . an 

Post(S)  - Po.st(bl)  A . 

. . A Post  (bn) 

A Ex i t ( f ) 

bl  . 

. . bn 

where  S is  of  form  <f  bl  ...  bn> 

case  of  a lambda: 

Pre(S)  Pre(bl)  a . . . 

A Pre(bn)  a 

a I . 

. . an 

(Post  (bl ) A . . 

. A Post (bn) 

-»  Pre(exp) 

bl  . 

. . bn  ) 

Post(S)  • Post(bl)  A ...  A Post (bn)  a Post (exp) 
where  S is  of  form  < <’l,AMBDA  <al  ...  an>  exp>  bl  ...  bn  > 


al 

bl 


. an 
. bn 


FIGURE  4-2 

Definitions  of  N(S)  for  the  syntactic  cases  of  S: 


ca.se  of  NIL; 

N(S)  » I 


case  of  T : 

N(S)  > 1 
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ISNUMBER  case: 

N(S)  - I 

ISIDENTIFIER  case: 

N(S)  - I 

AND  (no  arguments)  subcase: 

N(S)  - I 

AND  (n  > 0 arguments)  subcase: 

N(  < ’AND  bl  b2  . . . bn  > ) - 

If  bl  - NIL  then  N(bl)  else  inax(N(bl) ,N(  < ’AND  b2  ...  bn  > )) 

OR  (no  arguments)  subcase: 

N(S)  - I 

OR  (n  > 0 arguments)  subcase: 

N(  < ’OR  bl  b2  . . . bn  > ) - 

If  bl  f>  NIL  then  N(bl)  else  ■ax(N(bl),N(  < ’OR  b2  ...  bn  > )) 
NOT  case: 

N(  < ’NOT  bl  > ) - N(bl) 

COND  (no  arguments)  subcase: 

N(S)  - I 

COND  (n  > 0 arguments)  subcase: 

N(  < ’COND  <cl  dl>  <c2  d2>  ...  <cn  dn>  > ) - 
If  cl  » NIL  then  iiiax(N(cl).N(dl)) 
else  max(N(cl) ,N(  < ’COND  <c2  d2>  ...  <cn  dn>  > )) 

Q^UOTE  case: 

N(S)  - I 

case  of  a function  call: 

N(  <f  bl  ...  bn>  ) - max(N(f),N(bl) N(bn)) 

case  of  a lambda: 

N(  < <’ LAMBDA  <al  ...  an>  exp>  bl  ...  bn  > ) > 
max (N (exp) ,N(bl) . . . . ,N(bn)) 
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FIGURE  4-S 

Assertions  about  the  compilation  of  code  EXP  for  each  syntactic  case; 
case  of  a function  definition; 


ISFUNCTIONDEF(EXP)  -♦ 

OUTFILE  - 

< ! OUIFILE’ 

< ’LAP  CAnR(EXP)  ’SUBR  > 

! FMKPUSH(LENGTH(CADDR(EXP)).  I) 

! FCOMPEXP(CADDDR(EXP). 

-LENGTH (CADDR (EXP)), 

PRUP(CADDR(EXP),  I). 

) 

< ’SUB  ’P  < ’COO  LENGTH  (CADDR  (EXP))  LENGTH  (CADDR(EXP))  > > 

< ’POP]  ’P  > 

’NIL 

> 

case  of  NIL: 

1SN1L(EXP> 

OUTFILE  - 

< ! OUTFILE’ 

< ’MOVEI  I 0 > 

> 

case  of  T; 

IST(EXP)  -» 

OUTFILE  - 

< ! OUTFILE’ 

< ’MOVEI  I < ’QUOTE  ’T  > > 


ISNUMBER  case; 

ISNUMBER(EXP)  -r 
OUTFILE  « 

< ! OUTFILE’ 

< ’MOVEI  I < ’QUOTE  EXP  > > 

> 


# • ••  V*' 


ISIDENTIFIER  case; 
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ISIDENTIFIER(EXP) 

OUTFILE  - 
< ! OUTFILE’ 

< ’MOVE  1 RETRIEVE(EXP.M,L0CTABLE.0OTFILE’)  ’P  > 

> 


AND  (no  arguments)  subcase: 

ISAND(EXP)  A NULL(CDR(EXP)) 
OUTFILE  « 

< ! OUTFILE’ 

< ’MOVEI  1 < ’QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 


AND  (n  > 0 arguments)  subcase; 

OUTFILE  - 

< ! OUI  FILE’ 

! FCOMPEXP(EXP.M,LOCTABLE) 

> 

ISAND(EXP)  -» 

OUTFILE  - 

< ! OUTFILE’ 

! FCOMPANDOR(CDR(EXP)  ,M.LI  .FALSE.LOCTABLE) 

< ’MOVEI  I < ’QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 


I SAND  (EXP)  A NOT  NULL(CDR(EXP))  -» 

OITFILE  - 
< ? OUTFILE’ 

! FCOMPEXP  (CADR (EXP) , M,  LOCTABLE) 

< ’JUMPE  I LI  > 

! FCOMPEXP(<  ’AND  ! CDDR(EXP)  >.M, LOCTABLE) 


OR  (no  arguments)  subcase: 


RFSUL.TS 


il 


ISOR(EXP)  A NULL(CDR(EXP))  -♦ 
OUTFILE  - 
< ! OUTFILE’ 

< 'JRSr  0 LI  > 

1.4 

< ’MOVEI  I < ’QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

1.2 


OR  (n  > 0 arguments)  subcase: 

OUTFILE  = 

< ! OUTFILE’ 

» FCOMPEXP(EXP.M.LOCTABLE) 

> 


ISOR(FXP)  -r 
OUTFILE  - 
< ! OUTFILE’ 

• F(:OMPANDOR(CDR(EXP)  ,M.L4.TRUE.L0CTABLE) 

< ’JRST  0 LI  > 
lA 

< ’MOVEI  I < 'QUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

12 


ISOR(EXP)  A NOT  NULL (CDR (EXP)) 

Om  FlLE  - 
< ! OUTFILE’ 

? FC;OMPEXP(CADR(EXP)  .M.LOCTABLE) 

< ’JUMPN  I Li  > 

! FC;OMPEXP(<  ’OR  ! CDDR(EXP)  >. M.LOCTABLE) 


NOT  case: 

OUI  FII.E  - 
< ! OUI  FILE’ 

! FCOMPEXP(EXP, M.LOCTABLE) 


> 


RESULTS 


ISNOT(FXP)  -> 

OUri  II  K - 
< * OUll  llE’ 

• 1 1 OMI’FXP (CADR (EXP) . M.  LOCTABLE) 

< • JUMPN  I 1,1  > 

< 'MOVEI  I < ’()(K)TE  ’T  > > 

< ’JR-Sr  0 1.2  > 

11 

< ’MOVEI  I 0 > 

12 


(T>N1)  (no  argument!;)  subcase. 


1SC0N1)(EXP)  A NULL(CDR(EXP))  -* 

OUTEllF,  = 

< ? OUTFILE’ 

1 .S 

> 

CONI)  (n  > 0 arguments)  subcase: 

OUIFILF  - 

< ! OUlFllE* 

• FCOMPEXP (EXP. M. LOCTABLE) 

> 

ISCOND(EXP)  ^ 

OUTFILE  - 

< ! OUTFILE’ 

! FCOMCOND (CDR (EXP)  .M.L5, LOCTABLE) 

> 

ISCOND(EXP)  A NOT  NULL  (CDR  (EXP))  -* 

OUTFILE  » 

< ! OUTFILE’ 

! FCOMPEXP  (CAADR (EXP)  ,M,  LOCTABLE) 

< ’JUMPE  I L3  > 

• FCOMPEXP(CADADR(EXP).M, LOCTABLE) 

< ’JRST  L5  > 

L3 

• FCOMPEXP(<  ’COND  ! CDDR(EXP)  >.M.LOCTABLE) 

> 

Q.UOTE  case: 
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ISQUOTE(EXP)  -* 
OUTFILE  - 
< ! OUTFILE’ 

< 'MOVEI  I EXP  > 

> 


case  of  a function  call; 

ISFUNCTIONCALL(EXP)  -* 

OUTFILE  - 
< ! OUTFILE’ 

! FCOMPLIS(CDR(EXP).M,LOCTABLE) 

! FLOADAC(l-LENCTH(CDR(EXP)),I) 

< ’SUB  ’P  < ’COO  LENGTH (CDR (EXP))  LENGTH (CDR (EXP))  > > 

< ’CALL  LENGTH (CDR (EXP))  < ’E  CAR(EXP)  > > 


case  of  a lambda: 

OUTFILE  = 

< ! OUTFILE’ 

! FCOMPEXP(EXP,M.LOCTABLE) 

> 

ISLAMBDA(EXP)  -> 

OUTFILE  = 

< ! OUTFILE* 

! FCOMPLI S (CDR (EXP) . M , LOCTABLE) 

! FCOMPEXP(CADDAR(EXP). 

M-LENCTH(CDR(EXP)), 

ADD1DS(L0CTABLE.CADAR(EXP) . I-M) 

) 

< ’SUB  ’P  < ’COO  LENGTH (CDR (EXP))  LENGTH (CDR (EXP))  > > 

> 


4.2  A Simple  Example  Proof 

As  further  evidence  (beyond  the  proof  of  MCO)  of  the  applicability  of  these  techniques 
for  compiler  proving,  we  will  use  them  in  a proof  of  the  compiler  first  proved  by  McCarthy 
and  Painter  (McCarthy671  The  proof  is  expected  to  serve  also  as  a more  easily 
comprehended  example  of  how  the  collection  of  compiler  proof  techniques  fits  together  Into  an 
integral  whole.  The  simplicity  of  the  McCarthy-Painter  compiler  allowed  us  to  carry  out 
nearly  the  entire  proof  interactively  rather  than  by  hand.  The  McCarthy-Painter  compiler 
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produces  few  types  of  target  language  Instructions,  and  small  numbers  of  instructions.  The 
verification  conditions  were  correspondingly  small,  increasingly  so  because  of  the  simple 
statement  of  correctness.  These  facts  allowed  the  human  to  overcome  many  of  the  features 
required  for  compiler  proving  (such  as  those  described  in  Section  7.2)  lacking  in  the  Xivus 
system.  I'he  proof  serves  as  evidence  that  the  approach  taken  to  proving  MCO,  particularly 
the  part  two  proof,  is  well  adapted  to  an  interactive  program  proving  environment.  The 
proof  was  done  on  a slightly  modified  version  of  the  Xivus  program  verification  system,  with 
the  exception  of  parsing  the  target  code  by  hand.  The  modifications  made  to  the  system  were 
to  cause  the  verification  condition  generator  to  indicate  where  substitutions  were  to  be  made 
rather  than  actually  doing  the  substitutions.  This  was  required  (for  the  part  two  proof  only) 
because  the  assertions  on  which  the  substitutions  were  to  be  made  were  of  course  represented 
symbolically  rather  than  being  the  assertions  themselves. 

Following  is  a listing  of  the  McCarthy-Painter  compiler  after  translating  it  into  slightly 
extended  Pascal.  It  has  been  made  a procedure  In  order  to  have  a variable  parameter 
(OUTFILE)  onto  which  target  code  is  added  as  it  is  produced,  as  described  In  Section  3.2. 

PROCEDURE  COMPILE(E  : LIST;  T : INTEGER;  MAP  ; LIST; 

VAR  OUTFILE  : FILE); 

ENTRY  ISEXPRESSION(E); 

EXIT  [Use  assertion  for  each  syntactic  type  case  here]  ; 

BEGIN 

IF  ISCONST(E)  THEN  OUTFILE  RIGHTCONS(OUTFILE,MKLI  (VAL(E))) 

ELSE  IF  ISVAR(E)  THEN  OUTFILE  RIGHTCONS(Oin'FILE,MKLOAD(LOC(E,MAP))) 

ELSE  IF  ISSUM(E)  THEN  BEGIN 

COMPILE (SI (E) ,T, MAP, OUTFILE) ; 

OUTFILE  :=  RIGHTCONS(OUTFILE,MKSTO(T)) ; 

C0MPILE(S2(E) ,T+1 .MAP, OUTFILE) ; 

OUTFILE  RIGHTCONS(OUrFILE.MKADD(T)) ; 

END; 

END; 

ISEXPRESSION  is  a predicate  which  holds  if  and  only  if  its  argument  is  a legal  source 
language  expression. 

1 he  functions  MKLI,  MKLOAD,  MKSTO,  and  MKADD  represent  the  abstractions  of 
four  types  of  target  language  instructions.  The  part  two  proof  will  use  Hoare  proof  rules  that 
work  directly  with  these  abstractions,  unlike  MCO  in  which  we  dealt  with  the  Instructions 
themselves.  T he  function  VAL(E)  gives  the  numerical  value  of  a constant  E expressed  in 
source  language,  while  LOC(E,MAP)  gives  the  target  machine  location  of  a source  language 
variable  E by  referring  to  the  symbol  table  MAP.  These  functions  are  equivalent  to  the 
substitutions 


E 


c 

k 
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E 


get v(MAP) 
gctw(MAP) 


respectively,  where  c is  the  list  of  acceptable  source  language  constants,  k 1$  the  corresponding 
list  of  values  (that  is,  the  ith  item  in  k is  VAL  of  the  ith  item  in  c),  getv  extram  the  list  of  all 
source  language  variables  in  MAP  (in  the  order  that  LOG  searches  MAP  If  redeclaration  is 
iniportant),  and  getw  extracts  the  corresponding  list  of  their  locations.  In  order  that  the 
substitution  may  be  sequential  rather  than  just  simultaneous,  we  assume  that  constant  values 
arc  distinguishable  from  source  language  constants  (c  and  k have  no  intersection),  and 
variables  and  their  locations  are  distinguishable.  Further  we  will  assume  that  pj  source 
constants  can  occur  in  the  notation  used  for  target  locations,  no  source  variables  can  occur  in 
the  notation  used  for  constant  values,  and  that  source  variables  and  constants  are 
distinguishable.  1 hese  assumptions  will  allow  us  to  interchange  certain  of  these  substitutions 
easily  during  the  part  two  proof.  This  distinguishability  could  of  course  be  accomplished  by 
use  of  some  sort  of  special  symbols  to  express  any  sets  that  would  otherwise  violate  these 
assumptions. 

As  explained  in  Section  3.13,  the  Lisp-like  function  RIGHTCONS  will  be  defined  by 
axioms  introduced  into  the  theorem  prover  describing  how  RIGHTCONS  relates  to  other 
Lisp- like  functions  which  will  arise  from  the  assertions.  The  predicates  ISCONST,  ISVAR, 
and  ISSUM  are  the  definitions  of  the  source  language  syntactic  types,  and  so  will  be  defined 
by  axioms  used  In  the  theorem  prover,  as  described  in  Section  3.14.  Similarly  the  analytic 
syntactic  functions  SI  and  S2  (used  in  the  compiler)  will  be  described  by  axioms  later. 

We  now  write  the  assertions  for  the  part  one  proof  (proving  what  the  compiler 
produces)  in  the  list  notation  described  in  Section  3.5.  The  compiler  has  no  loops  and 
therefore  needs  no  internal  assertions.  For  the  reasons  given  above,  all  the  subsidiary 
functions  will  be  later  defined,  and  thus  will  simply  be  given  TRUE  Entry  and  Exit  assertions 
now.  Therefore  the  only  assertions  needed  for  the  compiler  are  the  Entry  and  Exit  of  the 
main  procedure.  Because  we  will  carry  out  each  case  separately,  we  will  require  three  Exit 
assertions. 


I)  ISCONST(E)  -»  OOTFILE  - < ! OUTFILE’ 

MKLI (VAL(E)) 


2)  ISVAR(E)  -♦  OUTFILE  - < ! OUTFILE’ 

MKLOAD(LOC(E,MAP)) 


3)  ISSUM(E)  OUTFILE  - < ! OUTFILE’ 

? FC0MPILE(SI(E),T,II1AP) 
MKSTO(T) 

! FC0MPILE<S2(E),T+1.MAP) 
MKADD(T) 


As  before  the  concluding  prime  Indicates  Initial  value. 
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In  addition  we  will  have  the  assertion  involving  the  F function  FCOMPILE,  as 
explained  in  Section  3.16. 

OUTFILE  - < ! OUI  FILF.’ 

! rC0MrnE(E.T,MAP) 


7 his  assertion  shows  the  relation  that  we  wish  to  exist  between  the  final  value  of  OUTFILE 
and  FCOMPILE;  that  relation  is  that  FCOMPILE  is  exactly  the  target  code  added  to 
OUTFILE  during  an  execution  of  compile.  During  the  part  one  proof  we  will  assume  certain 
theorems  Involving  FCOMPILE,  and  those  assumptions  will  constitute  the  definition  of 
FCOMPILE 

We  now  mechanically  translate  the  list  notation  assertions  to  Pascal  using  Lisp-like 
functions  to  construct  the  lists.  7 he  assertions  for  each  syntactic  case  are  inserted  In  turn  Into 
the  compiler  code,  and  each  case  is  then  submitted  to  the  Xivus  program  verification  system  to 
complete  the  part  one  proof  of  the  compiler.  Four  verification  conditions  were  produced  for 
the  ISCONST  case  (because  there  are  four  execution  paths  in  the  compiler),  and  the  following 
axiom  was  required  for  their  proof. 

LISSUMI: 

ISSUM(X. ) 

ISEXPRESSI0N(S1(X.)) 

A ISF.XPRF.SSI0N(S2(X.)) 

l.ISSUMI  is  a name  for  the  axiom  which  is  used  by  the  theorem  prover.  A dot  after  a 
variable  name  Indicates  universal  quantification  of  that  variable  over  the  entire  axiom.  The 
Indentation  is  used  to  indicate  the  precedence  of  the  operations.  7'hat  is.  In  the  axiom  above, 
the  principal  operator  is  -♦  (implies),  while  the  a (and)  simply  joins  two  conclusions. 

The  ISVAR  case  produced  four  new  verification  conditions  which  required  LISSUMI 
and  the  following  axiom  for  their  proof. 

I.ISVARI: 

ISVAR(X.)  -♦  NOT  ISC0NST(X.) 

7'he  ISSUM  case  produced  four  new  verification  conditions  which  required  LISSUMI 
and  the  following  axioms  and  rewrite  rules  for  their  proof. 

LISSUM2: 

ISSUM(X.)  ■+  NOT  LSC0NST(X.) 

LISSUM3: 

ISSUM(X.)  -♦  NOT  ISVAR(X.) 
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L1SEXPRESSI0NI; 

NOT  1SC0NST(X.) 

A NOT  ISVAR(X.) 

A NOT  1SSUM(X.) 

-»  NOT  ISEXPRESSION(X.) 


RCONSI : 

APPEND (XX. .CONS(YY. .ZZ.)) 

-->  APPEND (RIGHTCONS (XX. .YY. ) .ZZ. ) 

RCONS2 : 

APPEND(XX. . ’NIL)  — > XX. 

RCONS3R: 

APPEND (XX. .APPEND (YY. .ZZ.)) 

-->  APPEND(APPEND(XX. .YY.).ZZ.) 

The  axioms,  whose  labels  begin  with  the  letter  L.  are  those  required  to  define  the  source 
language  syntax  as  explained  in  Section  3.14.  The  rewrite  rules  represent  the  axiomatic 
description  of  the  Lisp-like  functions  as  explained  in  Section  3.13.  The  dashed  right  arrow  (- 
->)  indicates  that  the  pattern  to  its  left  is  replaced  by  the  form  to  the  right.  Variables  in 
rewrite  rules  followed  by  dots  are  ones  that  may  be  bound  to  any  expression  to  cause  a match 
of  the  left  side. 

During  the  part  one  proof  we  have  assumed,  in  the  theorem  prover.  several  theorems 
about  the  F function  FCOMPILE.  as  explained  in  Section  3.16.  The  assumptions  state  that 
FCOMPILE  is  exactly  the  target  code  added  to  the  initial  value  of  OUTFILE,  as  specified  in 
the  Exit  assertions  for  the  part  one  proof.  Those  assumptions,  written  in  the  list  notation,  are 
as  follows. 

ISCONST(E); 

< ! OUTFILE’ 

! FCOMPILE(E.T.MAP)  . 

> 


< ! OUI  FILE’ 
MKLI(VAL(E)) 

> 

ISVAR(E): 

< * OUTFILE’ 

! FCOMPILE(E.T.MAP) 


> 
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< ! OUTFILE’ 

MK LOAD (LOC<E. MAP)) 


ISSUM(E): 

< ! OUTFILE’ 

! FCOMPILE(E.T.MAP) 


< ! OUTFILE’ 

! FCOMPILE(Sl(E),T.MAP) 
MKSTO(T) 

! FCOMPILE(S2(E).T+l,MAP) 
MKADD(T) 


T he  statement  of  correctness  of  this  compiler  is; 


A 


V 

w { FCOMPILE(E.T.MAP)  } A 


where  v stands  for  getv(MAP),  w stands  for  getw(MAP),  c and  k are  as  above,  and  Ri  is  the 
accumulator.  T his  is  equivalent  to  the  statement  of  correctness  of  MCO  when  the  following 
facts  are  taken  into  account.  The  Pre  and  Post  conditions  on  the  ISCONST  and  ISVAR 
syntactic  types  are  TRUE,  as  they  were  for  the  corresponding  types  in  MCO.  In  the  ISSUM 
case  the  Pre  and  Post  are  conjunctions  consisting  of  the  Entry  and  Exit  of  the  function  plus 
(both  T RUE)  and  the  Pres  and  Posts  of  the  arguments.  By  induction  on  the  depth  of  nesting 
of  syntactic  types,  the  Pre  and  Post  conditions  of  all  expressions  are  TRUE.  No  registers 
other  than  the  accumulator  are  wiped  out,  so  N(S)  is  always  one.  Where  MCO  had  only  one 
source  language  constant  requiring  translation  to  target  language,  the  McCarthy-Painter 
compiler  requires  translation  of  all  constants;  hence  we  have  the  c-k  substitution  Instead  of  the 
NII.-O  one. 

T his  compiler  uses  a run-time  stack  for  one  of  the  two  purposes  that  MCO  did,  namely 
that  of  storing  temporary  values  during  evaluation  of  expressions.  The  other  use  of  the  stack, 
that  of  storing  variables,  is  not  needed  in  this  compiler  because  its  source  language  does  not 
have  any  dynamic  allocation  of  storage.  As  with  MCO,  the  Hoare  formula  describing  the 
correctness  of  this  compiler  is  only  valid  if  certain  facts  are  true  about  the  stack.  Namely,  we 
have  assumed  that  the  stack,  which  we  will  represent  as  array  m,  has  not  been  changed  across 
execution  of  the  target  code  in  FCOMPILE,  at  least  up  to  (but  not  including)  m[Tl  Here  T 
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is  the  compiler  variable  pointing  to  the  next  available  location  in  the  stack.  Now  only  four 
target  language  instructions  are  produced  by  this  compiler,  and  the  one  represented  by 
MKSTO  is  the  only  one  that  can  affect  the  stack  m.  Since  all  uses  of  MKSTO  in  the 

compiler  use  1'  as  the  location  in  m to  be  set,  we  need  only  show  that  T never  holds  a lower 

value  than  its  initial  value.  I'hc  only  place  T is  changed  is  by  recursively  calling  the  compiler 
with  1-1 1 or  T as  1'’s  new  value.  By  induction  on  the  depth  of  such  calls,  we  can  easily  show 
that  1'  is  always  greater  than  or  equal  to  the  initial  value  of  T.  Thus  we  have  escaped  the 
need  to  set  up  axioms  describing  the  action  f target  language  Instructions  on  the  stack  to 
effect  a stackok  type  of  proof. 

We  may  note  here  that  T Is  the  parallel  of  P,  the  stack  pointer  of  MCO,  except  P 
changes  at  execution  time  (every  time  a function  is  entered,  by  means  of  the  PUSH 

instructions  placing  the  newly  allocated  arguments  onto  the  stack).  This  allows  MCO’s 

compiled  code  to  be  re-enterable,  which  is  required  for  recursive  functions  to  be  in  the  source 
language.  But  1'  may  be  (and  is,  in  fact)  computed  at  compile  time,  since  the  target  code 
produced  is  not  re-enterable. 

We  al.io  note  another  difference  between  MCO  and  the  McCarthy-Painter  compiler 
here.  The  latter  uses  ISCONST,  ISVAR,  and  ISSUM  to  test  the  expression  to  be  compiled  to 
determine  which  syntactic  type  it  is.  MCO  however,  uses  easier  tests  to  determine  syntactic 
type.  1 his  is  possible  because  we  may  allow  certain  non-expressions  to  pass  a test  or  because 
we  know  several  things  that  the  expression  is  not  by  virtue  of  the  types  for  which  we  have 
already  tested  earlier  in  the  code  of  the  compiler.  Use  of  these  simplified  tests  required  us  to 
state  and  use  more  source  syntax  ax  oms  to  describe  what  each  given  syntactic  type  of  MCO 
was  in  terms  of  the  functions  used  in  the  compiler  code  tests. 

We  will  now  do  the  part  two  proof,  that  of  showing  that  the  results  of  compiling 
(FCOMPH.E)  always  satisfy  the  Hoare  formula  statement  of  correctness  given  above,  and 
therefore  the  target  code  produced  by  the  compiler  has  the  same  semantics  as  the  source  code 
that  was  compiled.  We  will  do  this  by  symbolic  verification  condition  generation  back 
through  the  code  of  FCOMPILE  (which  we  know  from  the  part  one  proof),  and  then  prove 
the  resulting  verification  condition.  To  do  this  we  will  use  the  following  Hoare  rules 
describing  the  abstractions  of  the  target  language  instructions. 

Rl 

Q V { MKLI(V)  } (I 
Rl 

(I  m[I]  { MKl.OAD(I)  } (^ 
m 

(i  a(m.l,RI)  { MKSTO(I)  } (J 
Rl 

(^  m[l]4RI  { MKADD(I)  ) (^ 

These  Hoare  rules  can  be  understood  by  viewing  the  target  language  Instructions  as 
a.ssignment  statements.  MKLI  assigns  a value  to  Rl,  MKLOAD  assigns  the  value  In  a given 
stack  location  to  Rl,  MKSTO  is  the  assignment  of  the  value  in  R I to  a given  stack  location, 
and  MK  ADD  assigns  to  R I the  sum  of  a given  stack  location  and  itself. 
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Wc  must  also  have  a Uoare  rule  for  FCOMPILE  itself  because  we  will  encounter 
FCOMPJLE  of  the  components  of  the  code  that  we  are  presently  proving.  These  represent 
the  recursive  calls  to  the  compiling  procedure.  By  structural  induction  we  may  use  the 
statement  of  correctness  Itself  as  a Hoare  rule  for  these  smaller  components. 

In  order  to  carry  out  this  proof  interactively  on  the  available  Xivus  system,  the  target 
code  of  each  syntactic  type  case  was  parsed  by  hand  to  a Pascal  statement  that  had  an 
equivalent  Moare  rule.  T he  outline  of  an  interactive  system  for  carrying  out  more  complex 
part  two  proofs  is  given  in  Section  7.2. 

1'hc  part  two  proof  of  the  ISCONST  case  required  the  use  of  a subrule  and  a rewrite 
rule.  The  former  is  a generalization  of  subrule  4 (which  we  shall  call  subrule  4b),  and  the 
latter  (which  we  shall  call  MPI)  expressing  the  relationship  between  the  c-k  substitution  and 
the  function  val. 


Di  -'C  G (for  I s i s n)  -)  G 


Dl 

HI 


MPi; 

if  ISCONSTCX.) 


Dn 

Hn  • G 


then  X. 


— > VAL(X.) 


For  the  ISVAR  case,  the  part  two  proof  required  subrule  4b  and  the  following  rewrite 
rule  that  expresses  the  relationship  between  the  getv-getw  substitution  and  the  function  loc. 


MP2: 

if  ISVAR(X.) 


then  X. 


get v(MAP) 
getw(MAP) 


in[L0C(X.  .MAP)] 


In  addition  to  subrules  I,  2,  3,  4,  5,  6,  8,  9,  I2,  I3,  and  2I,  the  following  axioms  and 
rewrite  rules  were  required  for  the  ISSUM  case  of  the  part  two  proof.  Discussions  of  them 
follow. 


MP3: 

|m 


A 


a(in.T.X.) 


-->  A 


MP4: 


c 

get  V (MAP) 

m 

S2(E) 

k 

getw(MAP) 

a(m,T.X.) 

c 

get  V (MAP) 

— > S2(E) 

k 

getw(MAP) 

LISSUM4: 
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ISSUM(X.)  X.  - SI(X.)  + S2(X.) 
Subrule  6b: 


DD 

DD  .. 

, , 

(f  Cl  . 

. . On) 

nil  - (f  Cl 

HII  .. 

. Cn 

where  DD  is  a list  of  D’s,  HH  is  a list  of  H’s,  and  f is  any  source  language  function. 

In  MP3  the  A is  the  A from  the  Hoare  formula  that  is  the  statement  of  correctness  of 
the  compiler.  7 hat  is,  A represents  the  assertion  about  the  target  language  program  that  we 
are  using  in  the  verification  condition  generation  of  the  part  two  proof.  Because  T is  defined 
as  pointing  to  the  next  available  location  in  the  stack,  we  know  that  A contains  only  references 
to  m with  subscripts  less  than  T.  I'herefore  when  the  alpha  substitution  is  applied  to  such 
references  to  m,  the  alpha  will  always  simplify  back  to  m,  that  is,  remain  unchanged,  because 
of  the  alpha  simplification  rule:  a(m,T,X)tl]  becomes  m[l]  If  I k T. 

in  MP4  we  know  that  all  previously  declared  variables  have  locations  less  than  T,  and 
all  locations  to  be  used  in  compiling  S2(E)  are  to  be  located  in  M[T-rl]  and  beyond.  As  in  the 
MP3  rule,  no  references  will  ever  occur  to  m[T],  so  the  alpha  will  simplify  the  same  way. 
This  suggests  a possible  further  abstraction  that  could  be  defined,  that  of  containing  no 
references  to  mf7  I Then  a general  rewrite  rule  could  be  written  for  dropping  the  alpha 
substitution  when  that  condition  held. 

We  note  that  LISSUM4  is  simply  a definition  of  the  type  ISSUM  in  terms  of  its 
arguments  selected  by  SI  and  S2.  We  have  called  the  last  axiom  above  subrule  6b  because  it 
is  the  obvious  generalization  of  subrule  6 to  the  case  of  a multiple  substitution. 

This  concludes  the  description  of  the  interactive  proof  of  the  McCarthy-Painter 
compiler. 


4.3  An  Example  Case  of  a More  Complex  Compiler 

Mere  we  present  the  proof  of  one  syntactic  case  of  a more  complex  compiler.  It  is 
intended  to  serve  as  evidence  that  the  techniques  given  here  for  proving  compilers  are 
applicable  both  to  larger,  less  "toy-like"  compilers,  and  to  statement-oriented  source  languages. 
The  example  we  have  chosen  Is  the  compiler  for  the  language  PL/0  given  by  Wirth  tWlrth76, 
pp.  337-347].  The  language  PL/0  contains  such  features  as  assignment  statements,  begin-end 
statement  grouping,  conditional  (if)  statement,  and  the  WHILE  repetition  statement.  Block 
structured  declaration  of  variables  and  recursive  procedures  are  included,  though  without 
formal  parameters.  7 he  only  data  type  is  Integer,  and  the  usual  arithmetic  and  relational 
operators  are  supplied.  7'he  compiler  produces  target  code  for  a hypothetical  stack-oriented 
machine.  7 hc  stack  is  used  for  declared  variables  upon  each  entry  to  a recursive  procedure 
and  also  for  temporary  storage  used  during  evaluation  of  expressions.  More  complete 
descriptions  of  PL/0  and  the  compiler  (in  fact,  a derivation  of  the  compiler)  are  found  in  the 
above  reference  [Wirth76,  pp.  307-3361 

The  WHIl.E  statement  is  the  syntactic  type  case  selected.  Both  the  part  one  and  part 
two  proofs  were  done  by  hand  rather  than  interactively,  part  two  for  the  same  reasons  as  the 
MCO  proof,  and  part  one  for  the  reasons  given  below. 

7'hc  following  list  of  problems  prevents  a part  one  proof  of  the  PL/0  compiler  from 
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being  carried  out  Interactively  on  the  Xivus  system.  Each  is  accompanied  by  an  explanation 
of  how  this  problem  could  be  overcome,  often  by  rewrite  of  the  compiler  code,  but  never  by 
actually  changing  the  output  of  the  compiler. 

I.  The  compiler  contains  gotos  that  jump  out  of  the  procedure  in  which  they  lie.  This 
violates  the  assumption  in  proving  the  calling  procedure  that  the  called  procedure 
returns  to  the  statement  immediately  after  the  call.  It  also  causes  the  proof  of  the  called 
procedure  to  use  an  a.sscrtion  (the  one  at  the  label  outside  the  procedure)  that  may  use 
variable  names  out  of  their  declared  scope.  Since  all  such  gotos  are  used  to  "bail  out"  of 
a procedure  in  case  of  an  error,  we  may  rewrite  the  compiler  eliminating  the  gotos  if  we 
assume  correct  source  language  input  (ignore  source  language  error  detection)  for  our 
compiler  proof.  We  could  also  rewrite  the  compiler  with  an  error  flag  as  a var 
parameter  that  would  be  checked  by  a conditional  goto  in  the  calling  procedure  upon 
return  of  the  called  procedure. 

?.  1 he  compiler  uses  Inherited  variables  rather  than  passing  them  as  parameters  to  called 
procedures.  Although  it  would  result  in  lengthy  parameter  lists,  the  compiler  could  be 
rewritten  with  all  such  variables  passed  as  parameters.  Use  of  a system  using  the  Euclid 
style  of  procedure  call  rules  treating  global  variables  would  also  solve  this  problem. 

3.  Declared  constants,  WITH,  and  CASE  statements  are  used.  These  features  of  Pascal 
have  not  been  implemented  on  Xivus,  but  can  all  be  expanded  easily  In  terms  of  other 
Pascal  features  that  are  implemented. 

4.  A variable  named  CX  is  used  by  the  compiler  to  count  the  target  code  locations  as  they 
are  filled.  Exit  assertions  must  speak  of  several  intermediate  values  of  this  counter  in 
cases  where  several  addresses  are  significant  in  expressing  the  target  code  produced. 
The  Xivus  system  has  no  way  to  express  values  of  a var  parameter  other  than  the 
initial  and  final  values.  We  could  invent  a function  and  define  it  as  the  amount  by 
which  CX  is  incremented  in  compiling  various  pieces  of  code  in  order  to  express 
intermediate  values  of  CX  in  terms  of  the  initial  or  final  value,  but  we  feel  this  would 
result  in  cumbersome  notation.  Our  solution  for  the  proof  we  present  here  is  to  append 
a period  and  digit  to  express  successive  values  of  a variable.  This  notation  is  similar  to 
that  used  In  such  works  as  Ragland  (Ragland73]. 

5.  Value  parameters  to  procedures  are  modified.  The  Xivus  system  treats  non-var 
parameters  as  constant,  that  is  incapable  of  being  modified,  while  the  version  of  Pascal 
in  which  the  PL/0  compiler  was  written  treats  them  as  changeable,  though  the  change  Is 
not  reflected  in  the  calling  procedure.  Thus  we  must  rewrite  the  compiler  to  use  local 
variable  copies  of  non-var  parameters  which  we  wish  to  change. 

6.  Some  variables  are  used  only  by  the  lowe.st  level  character  reading  routines  (procedures 
GETCH  and  GETSYM)  in  order  to  buffer  a line  of  input  at  a time.  In  order  to 
simplify  notation  these  variables  should  be  treated  as  own  variables  or  Euclid  module 
variables,  that  is,  ones  that  are  internal  only  to  the  lower  level  routines,  but  whose 
values  must  be  preserved  from  one  call  to  the  next.  This  would  eliminate  the 
appearance  of  those  variables  in  any  of  the  higher  level  routines  that  call  GETSYM. 
The  interface  (which  must  be  expressed  in  Entry  and  Exit  assertions)  between  the 
higher  and  lower  level  routines  would  be  simplified  nearly  to  the  point  of  simply  saying 
that  GETSYM  gets  the  next  symbol  from  the  Input  file.  However  Xivus  has  no  way  of 
treating  an  own  variable. 

T he  above  rea.sons  convinced  us  that  it  was  Impractical  to  use  the  available  machine 
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assistance  for  the  part  one  proof  of  the  PL/0  compiler,  although  it  probably  would  have  been 
worthwhile  if  the  entire  compiler  were  to  be  proved  rather  than  Just  one  syntactic  type  case. 
In  essence  the  part  one  proof  was  carried  out,  however,  by  symbolically  executing  the  code  of 
the  compiler  with  symbols  representing  the  syntactic  parts  contained  within  the  WHILE 
statement.  It  was  assumed  during  this  execution  that  no  errors  were  found  and  that 
CETSYM  correctly  sets  the  variables  SYM,  ID,  and  NUM  with  the  next  symbol  and 
identifier  or  number,  respectively.  Thus  procedures  CETSYM  and  GETCH  have  not  been 
proved  in  any  manner  in  this  proof. 

T he  following  list  highlights  the  differences  between  the  PL/0  compiler  and  the  others 
to  which  we  have  applied  our  proving  methods.  In  all  cases  we  discuss  how  these  differences 
may  be  accommodated  into  our  proving  methods. 

1.  1 he  compiler  will  not  successfully  compile  programs  if  certain  limits  are  exceeded,  for 
example  identifier  table  sire,  depth  of  nesting  of  procedures,  and  size  of  resulting  code. 
We  may  assume  for  purposes  of  our  proof  that  these  limits  are  not  exceeded,  but  should 
supply  a separate  proof  (which  should  be  a relatively  straightforward  inductive 
assertion  proof)  that  if  these  limits  are  exceeded  then  the  compiler  notifies  the  user  that 
the  compilation  is  not  to  be  accepted  as  correct.  We  could  then  state  that  the  compiler 
was  proved  correct  under  the  condition  that  no  compilation  error  messages  were 
produced. 

2.  Actual  numeric  addresses  are  used  in  the  target  language  rather  than  symbols  for  labels. 
1 his  means  that  forward  gotos  often  have  to  be  created  without  the  proper  address  in 
them,  and  then  have  to  be  "fixed"  later  when  the  address  has  been  determined, 
f ortunately  the  compilation  of  a complete  source  language  syntactic  type  is  never 
produced  with  an  address  still  awaiting  the  "fixing"  operation.  Therefore  we  have  no 
problem  defining  what  the  compilation  of  a given  syntactic  type  is  for  the  part  one  or 
part  two  proofs.  However,  we  will  have  to  mark  points  in  the  target  code  produced  that 
correspond  to  locations  to  which  gotos  may  Jump.  Then  our  part  two  proof  will  treat 
these  marks  as  labels,  even  though  the  target  code  itself  does  not  use  labels.  It  might  be 
noted  that  this  treatment  by  the  PL/0  compiler  avoids  the  need  for  gensym  to  create 
labels. 

3.  There  is  no  "top-level"  to  this  compiler  that  checks  all  (or  nearly  all)  pieces  of  source 
language  to  determine  their  syntactic  types,  then  calls  the  appropriate  parts  of  the 
compiler  for  those  types.  It  is  a more  efficient  compiler  design  in  that  it  can  determine 
the  syntactic  type  of  any  piece  by  one  symbol  look-ahead  (this  is  possible  by  clever 
source  language  design),  given  the  knowledge  of  what  immediate  context  that  piece  is  in. 
1 hen  the  appropriate  compiler  procedure  is  called  without  returning  to  a top-level, 
where  the  immediate  context  would  probably  be  lost.  This  means  that  the  F function 
(which  we  will  call  FCOMPILE)  that  represents  the  target  code  produced  by  compiling 
the  various  syntactic  types  will  be  defined  not  in  terms  of  one  procedure  (as 
FCOMPEXP  wa.s)  but  in  terms  of  the  code  produced  by  different  major  compiler 
procedures  for  each  different  syntactic  type. 

4.  The  one  symbol  look-ahead  mechanism  requires  several  extra  parameters  in  the 
compiler,  such  as  tlie  last  character  read,  the  last  symbol  read,  the  last  identifier  read, 
etc.  In  order  to  carry  out  the  part  one  proof  it  will  be  necessary  to  assert  that  upon 
beginning  and  ending  the  compilation  of  each  source  language  syntactic  type,  all  such 
variables  have  been  carefully  kept  current  with  the  proper  values.  Thus  the  assertions 
for  part  one  will  contain  somewhat  more  information  than  in  the  MCO  proof. 
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5.  Several  loops  occur  in  this  compiler,  while  MCO  had  none.  This  simply  means  that  in 
addition  to  Entry  and  Exit  assertions,  some  internal  assertions  must  be  made  to  "break” 
the  loops. 

6.  There  are  some  errors  in  the  compiler  as  published.  In  the  course  of  symbolically 
executing  the  code,  two  misspellings  and  the  use  of  an  undeclared  (and  apparently 
iinneces.sary)  variable  were  found.  The  use  of  the  previously  proved  compilers  MCO 
and  the  McCarthy-Painter  compiler  reduced  the  chances  of  finding  errors  in  those 
instances.  I lowever,  the  errors  found  in  the  PL/0  compiler  underscore  the  need  in 
interactive  program  proving  systems  for  capabilities  to  reprove  a slightly  changed 
program  in  a manner  that  makes  efficient  use  of  all  previous  work. 

1.  Output  of  the  target  code  is  done  by  placing  the  code  into  an  array  with  the  use  of  a 
pointer  to  the  array  instead  of  using  a file  in  which  to  write  the  code.  This  is  required 
because  the  already  written  portion  of  the  array  must  be  accessible  later  for  goto  address 
fixing.  7 he  only  effect  this  has  on  our  proof  is  that  the  F function  FCOMPILE  must 
be  defined  as  the  code  lying  in  the  output  array  (CODE)  between  the  Initial  and  final 
value  of  the  array  pointer  variable  (CX). 

8.  Input  is  accomplished  as  a hybrid  of  file,  array,  and  variables  because  of  the  need  to 
look-ahead  one  symbol  and  the  desire  to  buffer  a line  ahead.  This  complicates  our 
denoting  "the  present  statement"  (or  expression)  in  our  assertions.  We  will  solve  this  by 
inventing  a function,  which  we  will  call  PRESENT,  that  combines  either  the  variable 
holding  the  last  identifier  read  or  the  variable  holding  the  last  number  read  or  a source 
language  keyword  or  symbol  (depending  on  the  last  symbol  indicator  SYM)  with  the 
remainder  of  the  line  buffer  and  the  remainder  of  the  input  file  to  give  us  the  entire 
pre.sent  statement  (or  expression).  Because  the  present  statement  may  be  nested  inside 
another  statement  or  be  followed  by  another  statement,  we  may  get  more  statements  or 
tails  of  statements  from  the  remaining  part  of  the  Input  file.  So  PRESENT  will  also 
remove  any  such  tails  to  give  us  exactly  the  present  statement  or  expression.  The  F 
function  FCOMPILE  will  then  be  defined  in  terms  of  the  present  statement  or 
expression  as  selected  by  PRESENT. 

9.  In  compiler  function  POSITION  a programming  shortcut  is  used  that  will  unnecessarily 
complicate  the  proof.  Instead  of  having  two  conditions  for  exiting  the  loop,  that  of 
.searching  the  entire  table  or  of  finding  the  desired  item  in  the  table,  only  the  finding 
condition  is  used  after  the  desired  item  has  been  placed  at  the  end  (last  in  the  direction 
of  search)  of  the  table.  This  has  the  effect  of  requiring  all  assertions  about  items  being 
in  the  table  or  not  to  note  the  exception  of  the  last  position.  It  also  adds  a var 
parameter  (the  table)  to  the  function,  which  requires  it  to  be  rewritten  as  a procedure, 
and  which  complicates  all  verification  conditions  involving  POSITION.  In  short,  the 
abstraction  that  POSITION  represents  had  an  unnecessarily  messy  interface  because 
this  programming  trick  was  used.  We  would  therefore  rewrite  the  function  before 
proving. 

By  symbolically  executing  the  compiler  we  have  determined  that; 
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FCOMPILE(  < ’WHILE  C ’DO  STMT  > . TABLE, TX.CX, LEV, DX)  = 


< eX.O; 

1 

FCOMPILE(  C ,TABLE,TX,CX.O,LEV,DX) 

CX.  1: 

< 

’JPC  0 CX.4  > 

CX.2: 

1 

FCOMPILE(  STMT  , TABLE, TX, CX.2, LEV, DX) 

CX.3: 

< 

’JMP  0 eX.O  > 

CX  4; 

> 

Note  that  the  implied  labels  have  been  shown,  as  promised.  The  WHILE  statement 
given  ref>resents  the  value  of  PRESENT  for  purposes  of  the  proof  of  this  syntactic  type  case, 
while  the  value  of  PRESENT  will  be  C at  the  time  that  a compiler  routine  is  invoked  to 
compile  the  condition  C,  and  will  be  STMT  at  the  time  that  a routine  is  invoked  to  compile 
the  statement  ST  MT.  The  context  of  the  compiler  is  passed  to  FCOMPILE  by  the  arguments 
TABLE  (the  symbol  table),  TX  (the  symbol  table  size),  CX  (the  index  to  the  next  available 
target  code  location),  LEV  (the  nesting  level  of  the  present  procedure),  and  DX  (the  size  of 
stack  that  is  filled  with  locally  declared  variables). 

T he  statement  of  correctness  of  the  compiler  In  the  form  of  a Hoare  formula  will  be 
similar  to  that  of  MCO.  For  statements,  we  have  corresponding  to  every  source  language 
lloarc  rule  the  target  language  Hoare  formula  which  is  exactly  the  same  except  that  it  uses 
FCOMPILE  of  each  piece  of  code,  and  it  applies  the  * operation  to  all  assertions.  For 
example,  the  source  language  Hoare  rule 

P { S } Q 

would  require  the  proof  of  the  corresponding  target  language  Hoare  formula 

P * { FCOMPIlE(S, TABLE, TX,CX, LEV, DX)  } (^  * 

where  the  * notation  is  the  shorthand  for  the  two  multiple  substitutions  that  translate  source 
language  constants  to  the  corresponding  target  language  constants,  and  source  language 
variables  to  the  corresponding  target  language  locations.  Note  that  the  Hoare  formula  that  is 
the  statement  of  correctness  will  be  of  different  form  for  each  different  statement  type  because 
the  source  language  Hoare  rules  are  different  for  each  type.  This  makes  it  appear  more 
complicated  than  MCO,  which  had  only  one  statement  type.  However,  the  form  of  each  Hoare 
formula  will  be  simpler  than  in  MCO  in  that  there  need  be  no  mention  of  registers  (there  are 
none  in  the  PL/0  target  machine),  Pre  and  Post  conditions  of  expressions  (for  reasons 
explained  below),  or  function  values  (statements  in  PL/0  do  not  return  values). 

One  slight  complication  introduced  by  PL/0  is  the  use  of  symbolic  names  for  constants. 
As  these  are  declared,  they  are  placed,  along  with  their  values,  into  the  symbol  table.  If  we 
expand  the  definition  of  v (the  list  of  variables  obtained  from  the  symbol  table)  to  include  the 
constant  names,  and  expand  w (the  corresponding  list  of  variables’  locations)  to  include 
constants’  values,  then  the  multiple  substitution  of  w for  v will  still  translate  all  source 
language  identifiers  into  their  target  language  counterparts.  Note  that  we  cannot  do  this 
translation  with  a separate  list  for  the  constants  because  the  source  language  allows 
redeclaration  of  names  as  either  variables  or  constants,  and  so  the  order  of  declaration  must  be 
strictly  preserved  to  obtain  the  most  recent  one.  Note  that  this  requires  that  getv  and  getw 
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(the  functions  to  extract  the  lists  v and  w from  the  symbol  table)  must  extract  the  lists  in 
reverse  order,  since  newly  declared  variables  are  added  to  the  end  of  the  table  in  this  compiler. 
Also  note  that  addresses  in  this  compiler  are  two-dimensional;  that  is,  they  have  a level  and  a 
location  within  that  level. 

1 he  definitions  of  c and  k as  lists  to  translate  source  language  constants  (not  having 
symbolic  names)  will  remain  the  same  as  in  the  proof  of  the  McCarthy-Palnter  compiler. 
1 hiis  the  * notation  will  represent  the  substitutions; 


c 

k 


gctv(TABLF..TX) 

getw(TABLF.,TX) 


1 he  Iloare  formulas  that  are  statements  of  correctness  for  expressions  are  slightly  more 
complicated  than  for  statements  because  we  must  deal  with  the  returned  value  of  the 
expression.  We  will  henceforth  refer  to  PL/0  expressions,  including  the  syntactic  types 
condition,  term,  and  factor  in  addition  to  expression,  as  general  expressions  to  distinguish 
them  from  the  narrowly  defined  syntactic  type  expression.  Since  there  are  no  registers  in  the 
target  machine,  general  expression  values  are  returned  in  the  next  available  location  in  the 
stack.  T he  stack  is  denoted  by  S and  the  index  to  the  last  used  location  is  T.  With  this 
notation,  the  effect  of  executing  a compiled  general  expression  is  equivalent  to  executing  the 
statements:  S[  r4  I]  :=  general-expression;  T :•=  T+l. 

All  general  expressions  are  composed  of  constants,  variables,  and  the  arithmetic, 
relational,  and  odd  (as  opposed  to  even)  operators.  User  defined  functions  are  not  allowed  in 
Pl./O.  1 he  Pre  atid  Post  conditions  of  constants  and  variables,  as  defined  for  the  MCO 
compiler  in  Figure  4-1,  are  TRUE.  The  Entry  and  Exit  conditions  for  all  the  operators  used 
are  al.so  TRUE.  By  a simple  induction  argument  (on  the  depth  of  nesting)  we  can  show  that 
the  Pre  and  Post  conditions  of  any  general  expression  are  TRUE.  This  allows  us  to  drop  the 
Pre  and  Post  conditions  from  the  Floare  formulas  that  are  the  statements  of  correctness  for 
general  expressions  and  statements,  as  mentioned  above.  The  result  for  general  expressions  is: 


Q 


T 

14  1 


s 

a(S,T4l,E  ♦ ) { FC0MP1LE(E. TABLE, TX.CX, LEV, DX)  } Q 


As  with  MCO,  we  assume  that  the  stack  is  preserved  (at  least  up  to  the  T position) 
during  the  execution  of  the  target  code  representing  any  source  language  syntactic  type.  But 
in  the  case  of  general  expression,  the  stack  pointer  T is  incremented  by  one  to  point  to  the 
location  used  for  the  expression  value.  Therefore  the  definition  of  stackok  remains  the  same 
(with  the  change  of  notation  for  the  stack  and  its  pointer),  but  in  addition  we  need  to  define  a 
new  property.  We  will  call  it  stackplusi  and  use  it  to  mean  stackok  except  that  T is  one 
greater  upon  returning.  Because  the  axioms  about  stackok  are  specific  to  the  target  language, 
we  will  have  to  redefine  the  axioms.  We  will  also  need  a new  axiom  to  relate  stackplusi  to 
stackok.  The  fact  that  the  target  code  has  loops  in  it  means  further  change  to  the  axioms,  but 
the  stackok  proof  still  remains  quite  similar  to  that  of  MCO. 

We  are  now  ready  to  tackle  the  part  two  proof  of  a WHILE  statement.  The  source 
language  Hoare  rule  for  WHILE  is; 
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Q A C { STMT  } Q.  Q A -C  -+  R 

Q { < WHILE  C DO  STMT  > } R 
1 he  target  language  Hoare  formula  which  we  must  prove  is  then: 
Q » A C * { FCOMP ILE (STMT. TABLE. TX. CXI, LEV. DX)  } Q *. 
Q*a-'(C*)  -»R* 


q ♦ { FCOMPILE(  < WHILE  C DO  STMT  > , TABLE, TX.CX, LEV. DX)  } R ♦ 

What  wc  must  show  for  a rule  of  inference  such  as  this  is  that  the  formula  below  the  line  is 
provable  given  the  formulas  above.  It  might  be  noted  that  the  compiler  context  (TABLE, 
etc.)  of  two  different  FCOMPILEs  appearing  in  one  Hoare  formula  need  not  be  the  same. 
Careful  examination  of  the  compiler  will  show,  however,  that  TABLE,  TX,  LEV,  and  DX 
are  constant  over  the  code  of  a given  procedure.  Therefore  we  need  name  only  CX 
differently  in  the  Hoare  formula. 

To  prove  the  Hoare  formula  for  WHILE  we  will  generate  a verification  condition  from 
the  R ♦ back  through  the  target  code  of  FCOMPILE.  First  note  that  assertion(r.X.4)  - R *. 
Similarly  a.ssertion(CX.O)  •=  *.  We  now  apply  the  following  Hoare  rule  for  a JMP 

(unconditional  jump)  statement. 


asscrtion(L)  { < 'JMP  0 L > } Q 

The  first  premise  of  the  Hoare  formula  we  are  proving  allows  us  to  proceed  generating  across 
the  FCOMPILE  of  STMT.  The  result  Is  * a C ».  The  JPC  target  language  statement  is 
a conditional  Jump  with  the  added  twist  that  the  tested  item  is  removed  from  the  stack  (or 
more  precisely  abandoned  by  backing  up  the  pointer).  Therefore  the  Hoare  rule  for  JPC 
looks  like  the  combination  of  the  Hoare  rule  for  an  assignment  to  the  stack  pointer  T and  that 
of  a conditional  Jump. 


( sm  q 


T 

T- 1 ) A ( -StT]  -»  assert  ion(L) 


T 

T-I  ) 


{ < 'JPC  0 L > } q 


The  result  is: 


( S[T] 


( q * A c * ) 


T T 

T-I  ) A ( -S[T]  R * T-I  ) 


T he  inductive  assumption  on  smaller  pieces  of  code  allows  us  to  generate  across  the 
FCOMPILE  of  the  condition  C by  applying  the  substitutions: 


r 


V-*''  • 
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T S 

Ti 1 o(S,T^ I ,C  » ) 

I hen  complete  the  verification  condition  generation  by  adding  the  Initial  assertion  <^  * as  a 
hypoihesis.  Distribute  the  substitutions  over  and,  implies,  not,  and  subscripting  to  get; 

Q » -♦ 

1 S r S 

(S  l^il  o(S,Til.C*)  [T  T+l  a(S,T+l,C  * )]  -♦ 

ITS  ITS 

Q*  T- I 1^1  a(S.T4l.C  ♦ ) A C ♦ T-1  T+l  a(S,T+l ,C  ♦ ) ) A 

IS  T S 

(-  S T^l  rt(S,T^I.C*)  [T  T+l  a(S,T+l,C  » )]  + 

1 T S 

R * T- I T^ I a(S.T^ l,C  ♦ ) ) 

Subriiles  2f  (remembering  that  S corresponds  to  m and  T to  P),  1 1,  and  4 allow  us  to 
diop  the  ']  1 I substitution  on  S in  both  places.  Then  subrule  21  is  applied  to  S and  to  T. 
Distribution  of  S substitution  over  T+l,  then  subrules  2f,  II,  and  4 results  in  the  hypotheses 
of  tlie  latter  two  implications  becoming: 

a(S,T+l,C  * ) tT^n 

and 

a(S,T+l,C  * ) tT+l] 

respectively.  Apply  subrules  9 and  21  and  arithmetic  simplification  to  reduce 

1 T 
T- I T+ I 

to  the  form 

T 

1 

which  is  dropped  by  subrule  14.  The  result  is; 


■J^ 
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S 

A (-  a(S,T4l,C  * ) [T4l]  R ♦ a(S.T4l.C  * ) ) 


But  by  simplification  rules  for  a,  we  have  a(S,T+l.C  ♦ ) tT+1]  = C ».  Is  a source 
language  assertion  and  the  » operation  translates  source  variables  into  locations  in  stack  S that 
are  at  SfT]  or  before.  The  same  applies  to  R and  C.  Changing  the  (T4-l)st  location  of  S will 
have  no  effect  on  Q «,  so  the  a for  S substitutions  may  be  dropped  on  Q, »,  R «,  and  C «. 
The  result  is: 

Q*->  (C*-*QKtAC*)  A (-C*-»R#) 

Simplify  this  logically  to  obtain; 

Q*a-’C»-4R» 

This  may  be  assumed  by  the  second  premise  of  the  Hoare  formula. 

1 hat  concludes  the  proof  of  the  Hoare  formula,  but  we  must  still  demonstrate  the 
stackok  property.  To  accommodate  the  loops  in  the  target  code  we  may  either  add  more 
complex  axioms,  such  as 

stackplus  I (t  I)  A stackok(t2)  -* 
s t ackok ( < LI : 

* tl 

< ’JPC  0 L2  > 

? t2 

< ’JMP  0 LI  > 

12; 

> ) 

or  else  find  all  paths  between  assertions  and  prove  the  stackok  property  for  each  path.  By 
induction  on  the  number  of  assertions  passed  during  an  execution,  we  know  any  path  through 
the  code  from  the  entrance  to  the  exit  has  the  stackok  property.  We  will  choose  the  latter 
because  the  above  axiom  is  so  messy  as  to  make  it  difficult  to  understand. 

We  will  unravel  the  paths  of  the  target  code  for  a WHILE  statement,  discarding  the 
Jumps,  and  find  two  paths.  One  caution  is  that  we  cannot  discard  the  part  of  a JPC 
instruction  that  affects  the  stack.  We  will  denote  that  part  by  a JPC’  instruction.  The 
resulting  paths  are; 

I)  < eX.O;  ? FCOMPILF.(  C . TABLE, TX.CX.O, LEV, DX) 

CX. I;  < ’JPC’  0 CX.4  > 

CX  2:  ! FCOMPILE(  STMT  ,TABLE,TX,CX.2,LEV,DX) 

> 
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2)  < CX.O;  ! FC0MP1LE(  C . TABLE. TX.CX.O, LEV. DX) 
CX.  1;  < ’JPC’  0 CX.4  > 

> 


We  now  Introduce  the  new  axiom  required  to  relate  the  stackplusi  property  to  stackok. 
SIO:  stackplusl(tl)  ^ stackok(  < ! tl  < ’JPC’  0 L > > ) 

T his  axiom  says  that  a JPC’  instruction  removes  one  item  (the  location  It  may  conditionally 
Jump  to)  from  the  stack  and  so  ’’balances"  a stackplusi  string  of  code. 

We  may  assume  as  the  inductive  assumption  the  following  on  any  smaller  pieces  of  code. 

stackplusl(  FCOMPILE(general-expression.  ...  ) ) 

$tackok(  FCOMPILE(stateinent . ...  ) ) 

With  the  use  of  stackok  axioms  S3  and  S7  (which  do  not  change,  since  they  do  not 
mention  specific  target  language  instructions),  we  immediately  show  the  stackok  property  on 
both  of  the  above  paths.  This  concludes  the  PL/0  WHILE  case  proof. 


■f'-.  ^ « . fc-  . 
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S.l  Introduction 

In  the  following  two  sections  are  listed  the  published  compiler  proofs  with  a brief 
description  of  each.  I'he  sections  are  divided  according  to  the  manner  in  which  the  compiler 
is  specified,  as  this  has  a distinct  effect  on  the  way  in  which  the  proof  is  carried  out. 
Discussion  of  their  relation  to  this  dissertation  appears  in  Section  5.4. 


5.?  Proofs  of  Compilers  Expressed  Operationally 

Proving  a program  correct  requires  a specification  of  what  the  program  is  supposed  to 
do.  In  the  case  of  a compiler,  this  usually  amounts  to  a statement  that  the  compiled  program, 
when  run,  has  the  same  effect  as  the  source  program  would  have  had  if  it  could  have  been 
run.  I'hus  the  specification  must  somehow  include  the  meaning  or  semantics  of  both  the 
source  and  target  language  of  the  compiler  we  wish  to  prove. 

McCarthy  [McCarthy63,McCarthy66]  was  the  first  to  suggest  a method  of  expressing 
the  syntax  and  semantics  of  a language  in  a manner  useful  to  compiler  proofs.  He  proposed 
describing  the  syntax  of  a language  by  the  use  of  predicates  and  functions  which  would 
recognize  syntactic  types  of  the  language  and  break  down  syntactic  constructions  into  their 
component  parts,  respectively.  These  abstractions  help  us  to  define  the  syntax  and  semantics 
of  the  language  as  well  as  to  translate  from  the  language  to  another.  The  abstractions  make 
the  proof  independent  of  the  particular  forms  used  to  express  the  source  language.  He  then 
defined  semantics  of  expressions  in  terms  of  their  values,  which  are  defined  recursively  as 
functions  of  the  values  of  the  component  syntactic  parts.  Semantics  of  statements  are  similarly 
expressed  in  terms  of  their  effects  on  the  program’s  state  vector.  McCarthy  and  Painter 
[McCarthy67]  then  u.sed  these  concepts  to  prove  a compiler  to  translate  very  simple  arithmetic 
expressions  into  a simple  machine  language  containing  load,  store,  add,  and  load-immediate 
instructions.  1'heir  compiler  was  basically  a case  statement,  with  recursive  compiler  calls  for 
subexpressions,  each  case  corresponding  to  an  allowable  syntactic  structure.  Ideas  first  used  in 
their  paper  are  found  in  much  of  the  later  compiler  correctness  work.  It  is  their  compiler  that 
is  used  as  an  example  (in  Section  4.2)  of  our  compiler  proof  methods. 

Still  using  similar  concepts.  Painter  [I*ainter67]  proved  a compiler  that  would  accept 
assignment  statements,  conditional  gotos,  and  I/O  statements.  About  the  same  time  Kaplan 
[Kaplan67]  produced  a proof  using  recursion  induction,  again  based  on  McCarthy’s  work,  of  a 
compiler  similar  to  Painter’s,  but  without  the  I/O  statements.  Burstall  [Bursta1l69a]  presented 
a proof  of  a simple  expression  compiler  in  the  McCarthy-Painter  vein  to  demonstrate  the 
u.sefulness  of  structural  induction.  A paper  by  Wada  et  al  [Wada73]  applies  the  McCarthy- 
Painter  methods  to  prove  a rather  simple  compiler. 

London  [London7l,London72]  proved  two  compilers  which  take  a subset  of  pure  Ll.sp 
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into  an  assembly  language  with,  among  other  features,  push  and  pop  stack  instructions.  This 
w,is  the  first  example  of  a compiler  proof  where  either  the  language  in  which  the  compiler 
was  written,  in  this  case  Rlisp,  or  the  source  language  was  a computer  language  in  existence 
for  purposes  other  than  a proof.  In  London’s  work  the  task  of  specifying  the  effect  of 
executing  a statement  is  simplified  by  the  fact  that  in  the  source  language  the  effect  is  only  to 
return  a single  value.  The  first  of  these  two  compilers  is,  in  slightly  modified  forrr.,  the 
compiler  used  as  the  principal  example  (presented  in  the  appendix)  of  the  compiler  proof 
techniques  presented  here.  The  second  of  the  compilers  proved  in  London’s  work  provides  us 
with  the  example  used  here  to  show  proof  of  optimized  versions  of  compilers  (see  Section 
A,  9). 

Although  Ragland  [Ragland73]  proved  a verification  condition  generator  rather  than  a 
compiler,  his  work  is  relevant  because  verification  condition  generators  possess  some  of  the 
same  problems  as  compilers.  1 he  verification  condition  generator  proof  resembles  a compiler 
proof  in  the  important  respect  that  the  correct  operation  of  either  oi  those  types  of  programs 
depends  on  the  semantic  meaning  of  its  source  language  (that  language  in  which  programs 
must  be  written  to  be  processed  by  the  verification  condition  generator  or  compiler).  The 
verification  condition  generator  then  implements  those  semantics  rather  than  producing  target 
language  with  equivalent  semantics  as  a compiler  does.  Thus  proof  of  the  generator  requires 
the  specification  of  semantics  for  more  than  just  the  language  in  which  it  was  written. 
Ragland  chose  the  use  of  Hoare  proof  rules  to  describe  the  source  language,  a choice  which 
has  been  found  useful  in  compiler  proofs,  and  was  in  fact  used  in  this  present  work. 

Newey  [Ncwcy75]  proved  a Lisp  interpreter.  He  then  described  how  to  prove  the  first 
of  the  compilers  London  proved,  but  Newey  was  not  able  to  carry  it  out  in  the  automated 
Scott  logic  system  he  was  using.  Newey’s  method  of  proof  was  to  give  an  interpreter  for  t’ne 
source  language  and  an  interpreter  for  the  target  language,  and  then  to  show  that  the  result  of 
running  the  first  interpreter  on  a source  language  function  is  always  the  same  as  running  the 
second  interpreter  on  the  compilation  of  the  function.  This  proof,  as  did  London’s,  makes  use 
of  the  fact  that  the  only  effect  of  a source  language  function  is  to  return  a single  value.  The 
methods  presented  in  this  present  work  do  not  need  this  restriction. 

('.hirica  and  Martin  fChirica75]  first  clearly  divided  a compiler  proof  into  two  parts; 
"what  is  produced"  and  "what  it  means."  Their  source  language  has  expressions,  assignments, 
conditionals,  while  loops,  and  block  structure,  but  not  procedures  or  functions.  They  stated 
that  the  method  could  be  extended  to  certain  kinds  of  procedures,  however.  They  did  not 
prove  the  parsing  part  of  the  compiler,  but  dealt  only  with  proving  the  code  generation  part. 
7 heir  compiler  was  written  in  a simple  language  with  such  features  as  assignment  and  whiles. 
They  state  that  their  proof  method  will  work  with  a variety  of  methods  of  expressing 
semantics,  including  the  Hoare  proof  rules  they  used.  Chirica’s  subsequent  dissertation 
rChlrica76)  announced  a change  In  approach  from  the  Chirica-Martin  paper.  The  new 
direction  Is  toward  an  algebraic  approach,  and  so  is  described  in  the  next  section. 

Samet  [Samct7b]  proves  the  equivalence  of  target  code  produced  by  an  optimizing 
compiler  to  that  produced  by  a non-optimizing  compiler.  Thus,  he  was  not  addressing  our 
piobicm  of  proving  the  equivalence  of  source  and  target  code.  However,  his  work  could  be 
ii.sed  to  prove  an  optimizing  compiler  by  showing  equivalence  to  a simpler  compiler.  This 
approach  appears  to  be  more  easily  carried  out  than  proving  an  optimizing  compiler  directly. 
Thus  Samet’s  work  could  be  used  to  advantage  for  equivalence  proofs  as  described  in  Section 
7.3. 
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Boyer  and  Moore  lBoyer77]  have  recently  proved  a simple  optimizing  compiler  written 
in  Lisp  to  compile  certain  Lisp  expressions.  The  proof  was  carried  out  on  a more  powerful 
version  of  the  Boyer-Moore  Lisp  Theorem  Prover  tBoyer75].  It  was  accomplished  entirely 
without  human  guidance  with  the  exceptions  of  stating  the  correctness  theorem  about  the 
compiler,  and  asking  that  a few  lemmas  be  established  before  the  main  result.  Their  system 
uses  previously  stated  and  established  lemmas  in  making  further  proofs.  The  statement  of 
correctness  was  a Lisp  expression  stating  that  executing  the  target  code  produces  a value  equal 
to  the  value  of  the  source  expression.  The  semantics  of  the  source  and  target  languages  were 
expressed  by  interpreting  functions.  The  size  of  the  compiler  proved  was  much  smaller  than 
MCO,  but  the  lack  of  the  need  for  human  intervention  during  the  proof  is  impressive. 


5.3  Proofs  of  Compiiers  Expressed  As  Mathematical  Functions 

In  what  we  here  call  the  mathematical  function  approach,  the  compiler  is  specified  as  a 
mathematical  function  rather  than  as  a program.  Because  actual  compilers  are  written  In 
computer  languages,  not  as  functions,  we  view  this  specification  as  a shortcoming,  unless 
auxiliary  proof  is  given  to  show  the  equivalence  of  the  compiler  code  and  the  compiling 
function.  Of  the  papers  mentioned  below,  only  Chirica  addresses  this  problem.  This 
functional  approach  does,  however,  add  to  our  understanding  of  compiling  by  identifying 
some  of  the  structure  underlying  computer  languages  and  compiling. 

Burge  [BurgeBS]  gave  a proof  of  a compiler  for  lambda  calculus  expressions.  It 
flattened  the  tree  structure  of  the  expressions  and  accomplished  some  binding  of  variables. 

Blum  [Blum69]  gave  a proof  of  a functionally  expressed  compiler  to  take  recursive 
functions  to  Turing  code.  Such  choices  of  languages  seem  to  be  of  little  practical  importance. 

An  algebraic  approach  to  the  compiler  correctness  problem  was  introduced  by  Burstall 
and  Landin  [Burstall69b].  This  approach  is  to  define  semantics  (in  a form  like  interpreting) 
and  compiling  as  algebraic  functions.  The  proof  of  the  correctness  then  Involves  showing  that 
the  result  of  the  source  interpreting  function  is  the  same  as  the  result  of  the  target  Interpreting 
function  applied  to  the  compiled  source,  modulo  some  method  (a  function  of  course)  of 
translating  target  semantic  terms  into  source  semantic  terms.  Their  method  may  be 
diagrammed  (and  was  by  them  similarly)  as  follows; 


Source  Compile  Compiled 
Expression  ^ Expression 


Source 

Language 

Interpret 


Source  Unload 
Semant ics 


Target 

Language 

Execute 

V 

Target 

Semantics 


The  unload  operation  is  the  Inverse  of  load,  which  is  to  load  the  value  (semantics)  into 
the  target  language  stack.  Unload  is  then  the  method  of  translating  target  semantic  terms  Into 
source  semantic  terms.  By  using  properties  of  the  algebras  and  functions  representing  the 
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coiners  and  sides  of  this  diagram,  they  showed  that  either  way  around  the  box  (that  Is,  by 
interpreting  or  by  compiling,  executing,  and  unloading)  results  in  the  same  value. 

A similar  diagram  could  be  drawn  for  the  methods  of  compiler  proof  given  here.  The 
source  language  at  the  corner  of  our  diagram  would  represent  a single  syntactic  case,  though, 
since  wc  have  not  pushed  the  induction  on  the  source  language  syntactic  structure  into  the 
algebraic  representation.  Further,  the  semantics  are  represented  by  Hoare  proof  rules  and 
Hoare  formulas.  The  process  of  showing  that  the  semantics  are  true  of  the  program  is  that  of 
verification  condition  (VC)  generation  and  proof  of  the  resulting  theorems.  The  verification 
procc.ss  on  the  left  is  unnecessary,  however,  because  the  Hoare  proof  rules  are  exactly  the 
definitions  of  the  semantics  of  the  various  source  syntactic  cases.  Therefore  we  have  labelled 
this  process  as  semantic  definition.  The  translation  between  source  and  target  language 
semantics  proceeds  in  the  other  direction  from  the  unload  operation.  This  process,  which  we 
have  called  name  substitution  in  the  diagram,  is  that  of  adding  the  v-w  and  c-k  (NlL-0  in 
MCO)  substitutions  to  the  Hoare  proof  rules,  as  well  as  the  quantification,  formal  argument, 
and  result  substitutions  derived  in  Chapter  4. 

Source  (Part  I proof)  Compiled 

Syntactic  Compile  Syntactic 

Case  ^ Case 

VC  General  ion 
and  Proof 
(Part  2 proof) 

.>v 

Source  Name  Target 

Semantics  Substitution  Semantics 

(Hoare  proof  rules)  (Hoare  formulas) 

Again  wc  show  that  two  ways  around  the  box  arrive  at  the  same  semantics,  this  time  target 
semantics,  though.  Note  that  wc  have  marked  the  part  one  and  part  two  proofs  of  our 
method. 

To  return  to  Burstall  and  Landin,  they  carried  out  a proof  for  a compiler  for 
expressions  whose  target  machine  used  a stack.  Then  in  stages  the  proof  was  extended  to  a 
conventional  accumulator-memory  type  of  target  machine.  This  algebraic  proof  apparently 
served  as  the  inspiration  for  Morris’s  work  and  Chirica’s  work  described  below. 

Morris  [Morris72]  treats  compiling  and  interpreting  as  algebraic  functions  to  prove  the 
correctness  of  a compiling  function.  Not  all  compilers  can  be  fit  into  the  algebraic  model  he 
u.sed.  His  source  language  has  expressions,  concatenated  statements,  assignment  statements, 
while  loops,  and  gotos  In  a later  work  [Morris73]  he  carried  out  part  of  a compiler  proof 
similar  in  concept  to  his  first,  but  using  a different  kind  of  algebra  as  the  structure  of  the 
languages  and  semantics. 

As  noted  in  the  previous  section,  Chirica’s  dissertation  [ChiricaTG]  represents  a change 
in  approach  from  Chirica-Martin  [Chirica75].  T he  new  approach  followed  Burstall  and 
Landin’s,  but  Chirica  applied  new  types  of  algebras.  The  problem  still  arises  of  finding  the 
algebraic  function  to  translate  from  target  semantic  terms  to  source.  The  shortcoming 
mentioned  above  of  proving  a compiling  function  rather  than  a compiling  program  is 
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acidrcs.ccd  by  showing;  how  assertions  may  be  generated  for  applying  the  inductive  assertion 
method  to  a program  to  show  its  equivalence  to  a compiling  function.  The  source  language  of 
the  compiler  proved  has  such  features  as  assignment,  conditionals,  while,  block  structured 
declarations,  read,  write,  integer  arithmetic,  and  logical  expressions.  Although  the  language 
lacks  functions  or  procedures.  It  is  stated  that  no  difficulties  were  found  In  applying  these 
techniques  to  a source  language  with  recursive  procedures  with  no  global  variables  (a  common 
restriction;  for  instance,  MCO  has  no  global  variables). 

Using  a mechanical  first  order  predicate  calculus  proof  checker,  W.  Diffie  checked  the 
McCarthy-Painter  proof  about  1971  (this  unpublished  work  is  mentioned  by  Igarashi, 
London,  and  Luckham  (Igarashi75,  p.  179]).  With  this  possible  exception,  Milner  and 
Weyhrauch  [Milner72]  were  the  first  to  apply  machine  aid  to  the  compiler  proof  process  by 
submitting  much  of  a compiler  proof  to  a mechanical  proof  checker.  Their  compiler  took  a 
simple  Algol-like  language  to  an  assembly  language  with  stacks.  The  compiler  was  described 
(written),  as  well  as  proved,  in  Scott’s  system  of  logic  [Scott 70]. 

Germano  and  Maggiolo-Schettini  [Germano75]  use  Markov  algorithms  for  the  compiler 
and  for  the  target  language.  These  language  choices  are  also  of  little  practical  importance. 


5.4  Significance  of  this  Work 

I'he  principal  contributions  of  this  dissertation  are  to  present  a method  of  proving 
compilers  with  machine  assistance,  and  to  give  example  proofs  using  this  method  to 
demonstrate  its  utility.  1 he  principal  features  of  the  method  are  the  use  of  Hoare  proof  rules 
to  describe  the  semantics  of  the  target,  source,  and  conipiler  languages,  and  the  use  of  a 
formalism  to  describe  and  simplify  the  substitutions  of  variables  that  result  from  the  use  of 
Hoare  rules.  T he  method  also  Involves  the  use  of  many  other  techniques  of  lesser  Importance 
or  originality.  However  the  practicality  of  the  method  is  greatly  enhanced  by  the  combination 
of  the.se  techniques. 

Machine  assistance  that  has  appeared  in  previous  compiler  proofs  includes  the 
generation  of  verification  conditions  In  the  Igarashi,  London,  and  Luckham  example 
[Igarashi 75],  Newey’s  u.sc  of  an  automated  Scott  logic  system  [Newey75],  the  machine  checking 
of  a hand -generated  proof  by  Milner  and  Weyhrauch  [Milner 72],  and  the  nearly  completely 
automatic  proof  of  Boyer  and  Moore  [Boyer77].  Thus  this  dissertation  is  one  of  only  a few 
efforts  that  use  interactive  machine  assistance  in  the  proof  of  a compiler.  The  significance  of 
this  work,  though,  is  that  it  gives  methods  that  we  believe  will  allow  larger  and  more  complex 
compilers  to  be  proved  with  machine  assistance  than  was  possible  with  past  methods.  MCO, 
the  compiler  proved  here,  though  by  hand  application  of  our  methods,  is  much  larger  than 
the  compilers  in  the  papers  cited  above,  except  Newey’s.  It  is  essentially  the  same  compiler  as 
was  not  completely  proved  in  Newey’s  work.  We  think  these  facts  are  evidence  that  our 
methods  represent  a step  toward  proving  larger  compilers.  Further  evidence  is  given  by  the 
fact  that  our  methods  have  been  applied  to  part  of  a still  larger  compiler,  the  PL/0  one,  and 
we  sec  no  rea.son  why  a program-proving  system  such  as  described  in  Section  7.2  would  not 
be  able  to  prove  all  of  it. 

7'his  proof  of  MCO,  while  using  a compiler  quite  similar  to  one  previously  proved  by 
London  [London 72],  employs  several  techniques  that  in  general  formalize  the  proof  methods, 
and  should  help  in  applying  machine  aid  to  future  compiler  proofs.  Probably  the  largest 
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|ii(thlcm  in  trying  to  mechanize  London’s  type  of  proof  Is  having  to  symbolically  execute  the 
syinholically  expressed  target  code.  Existing  symbolic  execution  systems  are  made  to  execute 
coin|ilciety  specified  code,  not  symbolically  expressed  code.  We  avoid  this  problem  by 
restructuring  the  problem  to  require  only  generation  of  verification  conditions  across  the 
symbolically  expressed  target  code.  I'hen  we  give  the  formalization  of  substitution  to  allow  us 
to  atromplish  tlie  generation  and  proof  of  the  resulting  verification  conditions.  A further 
diffeience  between  our  proof  and  London’s  Is  that  our  semantic  definition  method  is  not 
restricted  to  an  expression  language  which  returns  a single  value,  but  instead  applies  to 
languages  whose  semantics  can  be  expressed  by  hloare  proof  rules.  Clarke  tClarke77]  has 
obtained  some  interesting  results  defining  which  programming  features  may  be  included  in  a 
language  whose  semantics  are  to  be  completely  expressed  (complete  in  a sense  described  in 
Clarke’s  paper)  by  Moare  proof  rules. 

1 liis  dissertation  represents  the  first  complete  published  example  of  how  Hoare  type 
proof  rules  can  be  used  not  only  on  the  compiler  code  being  proved,  but  also  for  the 
description  of  the  semantics  of  the  source  and  target  languages.  Chirica  and  Martin 
IChirica75]  gave  a general  outline  of  such  a proof,  but  did  not  present  all  the  formal  details. 
If  is  interesting  that  in  Chirica’s  later  work  (Chirica76]  the  use  of  Hoare  rule  semantics  was 
abandoned  in  favor  of  an  initial  algebra  approach  to  semantics.  Chirica  comments 
[Cliirica76,  p.  273)  that  a proof  directly  from  the  Hoare  rules  "is  simply  unmanageable!"  While 
the  length  of  the  part  two  proofs  In  Section  A.8  might  at  first  lend  credence  to  his  view,  we 
feel  that  application  of  machine  assistance  overcomes  this  problem.  Our  methods  of  proof 
were  developed  with  the  thought  of  running  them  on  a mechanized  proving  system,  and  have 
in  fact  been  run  for  the  proof  of  the  McCarthy-Palnter  compiler.  With  a system  such  as  that 
discussed  in  Section  7.2,  we  believe  compilers  at  least  as  complex  as  MCO  could  be 
interactively  proved. 

A new  technique  used  in  the  proof  of  MCO  is  in  the  formalization  of  substitution  and 
its  properties.  1 he  Hoare  proof  rule  for  an  assignment  statement  requires  that  a substitution 
of  an  expression  for  the  free  uses  of  a variable  be  carried  out  upon  an  assertion.  This 
substitution  produces  the  assertion  that  must  be  true  before  execution  of  a statement,  from  the 
assertion  that  Is  true  after  execution.  In  the  proof  of  MCO  we  are  often  working  with 
assertions  containing  symbolic  names  for  parts  of  target  code  and  target  expressions.  Thus  we 
cannot  actually  perform  substitutions  prescribed  by  Hoare  rules  in  many  cases.  So  we  develop 
a notation  to  indicate  where  a substitution  is  to  be  performed  and  a series  of  simplification 
rules  to  simplify  such  notation.  That  this  formalization  may  be  used  on  a mechanized 
program  proving  system  was  shown  by  its  use  In  the  Xlvus  system  for  the  part  two  proof  of 
the  McCarthy-Painter  compiler.  We  believe  that  a program  proving  system  like  that 
described  in  Section  7.2  would  in  fact  make  this  formalization  quite  easily  used  in  compiler 
proofs. 

A serious  omission  of  the  Chirica  and  Martin  work  [Chirica75]  is  its  lack  of  treatment 
of  user-defined  functions.  This  present  dissertation  is,  we  believe,  the  only  proof  of  a 
compiler  using  any  machine  assistance  that  treats  such  functions,  which  are  a major  tool  in  the 
u.se  of  nearly  every  computer  language.  Thus  the  use  of  Hoare  proof  rules  to  define  the 
source  language  semantics  has  now  been  shown  to  work  for  this  Important  concept  of 
functions  in  source  languages.  In  fact  it  was  found  that  the  treatment  of  functions.  Including 
the  similar  concept  of  Lisp  lambda  expressions,  was  at  least  as  complex  to  prove  as  the 
combination  of  all  the  other  syntactic  types  of  the  source  language. 
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In  l(>,arashi,  London,  and  Luckham  [Igarashi75]  the  Hoare  rules  were  given  for  the 
various  types  of  statements  assuming  each  contained  no  nested  calls  to  user-defined  functions. 
It  was  further  stated: 

The  definition  of  P [precondition]  and  R [postcondition]  as  conjunctions 
means  some  loss  of  generality  if  nested  function  calls  occur  such  as  in  Y «- 
0(1  l(X)).  A more  complicated  definition  of  P and  R Is  known  for  such 
cases  but  it  is  not  implemented. 

Such  expanded  rules  for  nested  calls  were  developed  for  an  early  version  of  the  proof 
rules  for  Euclid,  but  were  not  used  in  the  final  version  [London77].  In  this  dissertation  we 
borrow  those  nested  function  rules  and  pioneer  in  their  use  in  compiler  proofs.  Such  rules  use 
a precondition  and  postcondition  recursively  defined  for  a series  of  nested  functions.  Use  of 
them  in  a compiler  proof  makes  precise  the  expression  of  the  effects  of  executing  nested 
functions  in  addition  to  the  already  demonstrated  facility  of  Hoare  rules  for  statements.  Thus 
the  method  of  compiler  proof  in  this  work  can  be  applied  to  either  a statement  type  or 
expression  type  of  source  language. 

l ew  of  the  other  works  have  applied  the  inductive  assertion  technique  to  compiler  code. 
Many,  such  as  Milner  and  Wcyhrauch  [Milner 72]  and  Morris  [Morris72],  have  defined  the 
compiler  as  a mathematical  function  rather  than  actual  code.  Since  a compiler  must  be  written 
as  code  to  be  used,  we  feel  the  distinction  is  important.  London’s  proof  of  two  Lisp  compilers 
[London71]  uses  compilers  expressed  as  code,  but  hand  executes  it  to  determine  results.  A 
mechanical  procedure  that  formally  turns  code  and  assertions  about  the  code  into  verification 
conditions  has  been  applied  in  one  example,  that  being  the  final  example  in  Igarashi,  London, 
and  Luckham  [Igarashi75];  but  the  formal  proof  of  those  verification  conditions  was  not 
carried  out.  Chirica  and  Martin  [Chirica75]  use  the  inductive  assertion  method  on  the  code 
generation  portion  of  a compiler  expressed  as  code.  In  our  method  applied  to  MCO,  an  entire 
compiler  is  given  as  code,  the  assertions  about  the  compiler  are  given,  a mechanical  method  Is 
applied  to  produce  verification  conditions,  and  they  are  proved.  Thus  this  is  one  of  only  two 
relatively  complete  applications  of  the  inductive  assertion  method  on  a compiler. 

If  we  are  to  prove  actual  compilers,  not  Just  abstract  compiling  methods,  we  must  apply 
the  method  of  inductive  assertions  to  the  compiler  code  and  complete  the  proof  of  the  entire 
compiler.  1 he  only  alternative  we  see  is  an  auxiliary  proof  that  the  compiler  is  equivalent  to 
the  abstract  compiling  method.  We  cannot  believe  that  a proof  of  a mathematically  defined 
compiler  alone  is  adequate  proof  of  a corresponding  actual  compiler  because  there  are  too 
many  ways  in  which  the  actual  compiler  may  subtly  differ.  Computer  programming  has 
produced  countless  examples  of  programs  with  subtle  differences  between  the  code  and  the 
Intent.  C'.hirica  [Chirica 76]  seems  to  be  the  only  one  among  those  expressing  compilers  as 
functions  (rather  than  code)  who  has  addressed  this  problem.  He  applies  an  auxiliary 
inductive  assertion  proof  after  the  proof  of  his  compiling  function  to  show  that  the  compiling 
function  actually  represents  what  the  compiler  code  does.  Our  approach  is  more 
straightforward  in  the  sense  that  it  applies  the  Inductive  assertion  methods  for  the  entire 
proof. 

A technique  used  in  this  proof  which  has  appeared  in  only  one  other  work  [Chirica75] 
is  that  of  dividing  the  compiler  proof  Into  two  distinct  parts.  First  it  Is  proved  what  the 
compiler  produces  for  each  source  language  syntactic  type,  then  it  Is  proved  that  the  target 
language  produced  has  the  same  effect  as  the  source  language  does  (or  would  have  if  source 
language  were  directly  executed).  The  division  Into  two  parts  simplifies  the  rather  complex 
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problem  of  proving  a compiler.  It  allows  us  to  deal  with  the  compiler  code  in  part  one, 
keeping  it  separate  from  the  use  of  source  and  target  language  semantics,  which  enter  only  the 
part  two  proof.  1 he  first  part  can  be  fairly  simply  handled  with  well  known  program  proving 
techniques.  It  requires  only  the  addition  of  a good  way  of  speaking  about  the  structures  of  the 
source  and  target  languages  (lists  in  MCO)  to  a typical  program  proving  system.  The  second 
part  here  was  carried  out  interactively  on  a small  example,  and  a system  is  discussed  in  Section 
7.2  that  would  prove  more  complex  compilers. 

Proof  of  the  properties  of  the  target  machine  stack  during  execution,  which  is  required 
to  validate  the  lloare  rule  formulation.  Is  carried  out  by  an  axiomatic  approach  similar  to  the 
subgoaling  techniques  used  by  Suzuki  [Suzuki75].  We  felt  that  such  properties  could  be  more 
understandably  expressed  by  this  approach  than  by  dealing  directly  with  the  messy  details  of 
the  stack  implementation,  and  the  ease  with  which  the  stack  proofs  were  carried  out  bears  out 
this  feeling.  7 his  is  the  first  application  of  such  techniques  to  a compiler  as  a whole,  although 
Giittag,  Horowitz,  and  Musser  [Cuttag76]  have  given  a symbol  table  example  proved 
axiomatically. 

Further  use  of  axiomatic  definitions  is  made  in  defining  the  source  language  syntax 
functions  and  the  Lisp-like  functions  used  in  the  compiler  to  build  or  take  apart  lists.  We 
consider  these  uses  important  because  they  reduce  the  complexity  of  the  verification  conditions 
by  postponing  the  entry  of  information  Into  the  verification  conditions  until  and  unless  it  is 
needed  in  their  proofs. 

1 he  techniques  used  to  address  the  gensym  problem,  that  of  handling  in  a proof  a Lisp 
function  that  is  not  a mathematical  function,  were  described  in  Section  3.17.  Gensym  or 
something  like  it  is  needed  In  a compiler  to  generate  unique  labels.  The  only  other  approach 
to  the  gensym  problem  we  have  seen  is  that  of  Milner  and  Weyhrauch  [Milner72].  We 
believe  our  approach  is  a higher  level  of  abstraction  than  Milner  and  Weyhrauch’s,  and 
results  in  a less  messy  proof. 

The  u.se  of  the  list  notation  described  in  Section  3.5  is  important  because  it  helps  reduce 
the  complexity  of  the  assertions  that  must  be  written  for  the  part  one  proof.  One  of  the 
criticisms  of  program  proving  that  has  often  been  presented  is  that  assertion  writing  is  too 
error-prone.  To  counter  this  criticism  we  must  use  tools  such  as  the  list  notation  to  simplify 
the  assertion  writing  process. 
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6.1  Introduction 

1 he  conclusions  reached  from  proving  MCO,  the  McCarthy-Painter  compiler,  and  part 
of  the  PL/0  compiler  are  that  certain  of  the  methods  employed  here  In  the  construction  of 
compiler  proofs  are  of  benefit  in  proving  compilers.  Secondarily,  we  believe  that  certain 
methods  used  in  constructing  compilers  are  of  benefit  in  their  proofs.  Further,  these  methods 
should  help  produce  more  reliable,  understandable,  and  maintainable  compilers,  particularly  if 
they  arc  proved,  but  also  to  a lesser  degree  if  they  are  only  kept  in  mind  during  compiler 
CO  istruction.  1'he  following  sections  list  such  methods  in  the  order  of  Importance  we  assign 
th' m. 


6.'.>  lloaie  Proof  Rule  Semantics 

One  of  the  goafs  of  defining  the  language  Pascal  by  the  use  of  Hoare  proof  rules 
[I  loarc73]  was  to  communicate  the  language  designer’s  intent  to  the  language  implementers, 
thit  is,  the  compiler  writers.  It  is  also  stated  there  that  such  a definition  serves  as  "an 
axiomatic  basis  for  formal  proofs  of  properties  of  programs."  We  feel  that  the  proofs  given 
here  demonstrate  that  such  language  definitions  serve  not  just  as  the  basis  for  proving 
properties  of  a program  (the  compiler),  but  further  serve  to  define  the  properties  of  all  three 
languages  involved  in  the  compiler  proof.  The  proofs  here  presented  worked  directly  from 
the  Hoare  proof  rule  description  of  the  semantics  of  the  compiler’s  source  language. 

We  also  demonstrate  in  these  proofs  the  use  of  Hoare  proof  rules  to  describe  the  target 
lan,^,uage.  It  has  been  argued  that  proving  machine  language  programs  by  such  techniques  as 
Hoare  proof  rules  and  verification  condition  generation  is  Impractical  because  machine 
language  is  able  to  operate  at  bit  levels,  modify  code.  Jump  to  code  anywhere  in  memory,  etc. 
But  MCO,  as  well  as  most  other  compilers,  produces  machine  language  code  that  intentionally 
ii.ses  separate  areas  for  code  and  data,  uses  a very  small  subset  of  the  available  operations,  and 
has  a strictly  specified  interface  between  functions.  In  fact  MCO  and  the  McCarthy-Painter 
compiler  (but  not  the  PL/0  compiler)  produce  target  code  without  any  loops.  In  other  words, 
the  target  code  produced  is  well-behaved.  For  instance,  it  is  the  strict  function  interface 
definition  specifying  where  arguments  are,  where  the  result  is  returned,  where  the  return  point 
is  left,  and  how  the  stack  is  to  be  preserved  that  allow  us  to  write  the  Hoare  rule  for  the 
CALL  statement.  1 hus  for  such  well-behaved  use  of  machine  language,  we  believe  Hoare 
proof  rules  are  an  appropriate  method  of  describing  machine  language  semantics. 
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6.3  Substitution  Formalization 

A fairly  lengthy  description  of  the  substitution  formalization  used  In  this  proof  is  given 
in  Section  3.6.  We  will  briefly  review  here  why  this  formalization  was  needed.  The  use  of 
Iloarc  rules,  as  described  above,  requires  substitution  to  be  performed  on  assertions.  The  use 
of  symbolically  expressed  assertions  and  code,  which  occurs  in  the  part  two  proof,  prevents  the 
substitutions  from  being  performed.  Therefore  the  substitutions  are  indicated  symbolically, 
and  the  need  arises  for  a method  of  manipulating  and  simplifying  them.  The  complexity  of 
formulas  involved  in  the  MCO  part  two  proof  was  simply  too  great  to  be  dealt  with  without 
such  a method.  The  properties  of  substitution  given  here  met  that  need,  and  further  was 
found  to  be  easily  mechanized  when  it  was  applied  in  the  automated  part  two  proof  of  the 
McCarthy-Painter  compiler.  The  ability  to  be  mechanized  Is  important  because,  as  is 
discussed  below,  we  feel  it  is  necessary  to  apply  machine  assistance  to  the  problem  of  proving 
compilers. 


6.4  Machine  Assistance  for  Proofs 

We  believe  that  program  proving  should  be  done  interactively.  The  human  should  be 
involved  for  the  difficult  and  insightful  portions  of  a proof,  while  the  machine  should  handle 
routine  and  repetitive  parts.  1 he  proof  of  MCO  contained  far  too  much  routine  detail  (see  the 
lengthy  Section  A.8)  to  reasonably  expect  humans  to  prove  compilers  this  way,  or  to  take  the 
time  to  understand  the  proof  once  completed.  Further,  the  amount  of  detail  will  grow  as  we 
attempt  proofs  of  larger  compilers.  Yet  we  know  of  no  automatic  proving  system  capable  of 
proving  MCO  without  human  intervention.  In  fact  the  part  two  proof  of  MCO  was,  we 
believe,  beyond  the  capability  of  any  computer-assisted  proving  system,  with  or  without 
human  help  (see  Section  3.9  for  justification  of  this  claim),  although  the  system  of  Boyer  and 
Moore  [Boycr77]  has  done  quite  well  with  proving  a simple  compiler  with  practically  no  user 
intervention.  We  envision  (in  Section  7.2)  an  Interactive  compiler  proving  system  using  the 
approach  given  here  that  could  be  built  in  the  immediate  future.  The  point  is,  we  believe 
compiler  proofs  to  be  too  tedious  to  expect  to  do  them  by  hand,  and  to  be  beyond  the 
capability  of  completely  automatic  theorem  provers;  therefore  they  must  be  done  interactively. 

6.5  Axiomatic  Descriptions 

As  explained  in  Chapter  3,  rewrite  rules  and  axioms  were  used  to  describe  the  Lisp-like 
functions,  the  source  language  syntax,  the  substitution  formalization,  and  stackok  (the  property 
of  run-time  stack  preservation).  The  ease  with  which  we  completed  portions  of  the  proof  of 
MCO  dealing  with  these  aspects  leads  us  to  recommend  their  use.  Such  methods  tend  to  be 
mechanizable  (for  example,  the  DTVS  system  [Cuttag76,  p.  35]  mechanizes  such  methods)  and 
fit  well  Into  our  philosophy  of  proving  compilers  interactively. 

Introducing  axiomatic  properties  at  the  time  of  theorem  proving  helps  reduce  the 
accumulation  of  data  into  verification  conditions,  which  is  invariably  a problem  in 
constructing  and  understanding  large  program  proofs.  If  all  properties  of  functions  (such  as 
the  source  syntax  predicates  or  the  Lisp-like  functions)  were  introduced  Into  the  code  of  the 
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cumpiler,  then  all  such  properties  would  appear  in  all  verification  conditions  involving  those 
functions,  whether  the  properties  were  needed  in  that  particular  verification  condition  or  not. 
1 his  would  substantially  reduce  the  readability  of  the  verification  conditions. 

Guttag,  llorowitr,  and  Musser  [Guttag76]  give  further  justification  for  such  techniques, 
and  in  fact  use  a symbol  table  as  an  example  of  where  the  methods  may  be  applied. 


G.6  Abstract  Data  Types 

1 he  case  has  been  made  many  places,  among  them  the  Alphard  language  work 
[Wulf76].  that  structuring  data  and  the  operations  upon  them  using  abstractions  will  aid  in 
producing  better  programs.  It  is  noted  there  [Wulf76,  p.  254]  that  grouping  the  associated 
data  and  functions  of  an  abstraction  has  the  following  advantages. 

1)  1 he  places  where  modifications  must  be  made  are  more  likely  to  be  close 
together. 

2)  A smaller  portion  of  the  program  will  be  likely  to  require  reverification 
when  a change  is  made. 

3)  1 he  user  of  the  abstraction  may  ignore  the  details  of  the  implementation. 

4)  It  becomes  possible  to  make  absolute  statements  about  certain  things  (e.g., 
data  structures)  which  are  independent  of  even  perverse  programmers. 

5)  The  implementation  of  the  abstraction  may  (sometimes)  ignore  the 
complexity  of  the  environment  in  which  the  abstraction  will  be  used. 

We  find  such  coticepts  to  be  applicable  to  compilers.  In  particular,  the  source  language, 
target  language,  location  table  (symbol  table),  and  run-time  stack  each  have  their  own 
structure  and  operations.  In  fact,  the  function  of  a compiler  may  be  viewed  as  rendering  the 
source  language  structure  into  its  component  parts,  then  building  a target  language  structure, 
the  constraint  being  to  preserve  semantics.  Thus  these  data  structures  play  a very  important 
role  in  the  construction  of  a compiler,  and  abstractions  of  them  are  likely  to  have  a large  effect 
on  the  quality  of  the  code  of  a compiler  and  the  ease  of  its  proof. 

The  source  and  target  languages  may  be  viewed  as  each  containing  further  structures. 
In  the  case  of  Lisp,  the  source  language  for  MCO,  function  definitions  are  strung  together 
linearly  to  form  larger  units.  All  else  in  the  language  is  nested  calls  to  functions. 

1 he  target  language  also  deals  with  different  levels  of  structure.  The  statements  of 
target  language  were  constructed  with  the  Lisp-like  functions,  but  were  strung  together  to  form 
functions  and  programs  With  what  we  called  files.  These  files  are  strictly  linear  lists  built 
strictly  in  order.  I he  codje  of  MCO  has  only  an  operation  to  add  a statement  to  a file,  none  to 
rearrange  or  take  apart  a file.  Such  limitation  of  the  means  of  accessing  a data  structure  are 
typical  of  the  data  abstraction  approach. 

Other  areas  of  MCD  where  data  abstraction  was  in  evidence  were  the  following.  The 
location  table  of  MCO  wav  built  as  a Lisp  ASSOC  list,  that  is,  a list  of  pairs,  each  pair  being  a 
name  and  its  associated  l^ation.  The  run-time  stack  was  treated  at  the  lowest  levels  as  an 
array  with  a top- of-slacki  pointer  In  parts  of  the  proof.  1 his  was  necessitated  by  the  need  to 
access  items  not  at  the  too  during  execution.  What  was  really  needed  was  a better  abstraction 
to  describe  stacks  with  Access  into  them.  Some  abstraction  was  done  In  the  stackok  proofs 
(proofs  that  the  stack  wap  preserved  over  certain  operations). 

It  became  obviou*^  during  the  proof  of  MCO  that  Isolating  these  various  data  structures 
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so  that  a set  of  axioms  or  assertions  represents  the  only  interface  with  the  rest  of  the  program 
is  essential  to  making  large  programs  understandable.  We  believe  that  understandability  is  a 
prerequisite  to  correctness  and  maintainability.  What  data  abstractions  we  found  (or  put)  in 
VfCO  helped  make  its  proof  simpler;  more  would  have  been  even  better.  Reaping  the  benefits 
cited  above  from  the  Alphard  paper  in  proving  the  properties  of  these  structures  within 
compilers  would  indeed  ease  the  proving  task. 

6.7  Other  Useful  Methods 

The  case  was  made  for  a list  describing  language  in  Section  3.6.  We  believe  that  lack  of 
good  notation  severely  restricts  the  complexity  of  programs  that  can  be  proved.  Because  a 
compiler  is  essentially  tearing  down  source  language  and  building  up  target  language,  we  must 
have  notations  for  these  objects  (languages)  in  order  to  describe  the  actions  of  the  compiler. 
Computer  languages  are  basically  character  strings,  so  we  must  have  at  least  a notation  for 
character  strings.  But  most  languages  have  further  structure  to  them  to  indicate  nested  parts, 
and  thus  a list  (as  in  Lisp  lists)  notation  seems  suitable.  Hence  we  claim  a good  notation  to 
describe  the  structures  of  the  source  and  target  languages,  often  a list  notation,  is  essential  to 
proving  compilers.  Because  many  expressions  that  appear  in  the  assertions  are  also  in  the 
code,  we  recommend  the  inclusion  of  that  notation  in  the  language  in  which  a compiler  is 
written. 

The  proofs  here  used  the  two-part  proof  method,  that  is,  proving  first  what  the 
compiler  produces,  then  that  what  was  produced  was  semantically  correct  (as  described  in 
detail  in  Section  3.8).  It  divided  the  proof  into  two  parts  that  required  different  kinds  of 
proving  systems  and  different  parts  of  the  compiler  description.  The  interface  between  the 
parts  was  relatively  small.  Therefore  we  feel  this  method  is  an  important  aid  in  reducing  the 
complexity  of  compiler  proofs  and  in  managing  them. 

Nearly  all  the  proofs  of  operationally  expressed  compilers  have  been  made  using 
structural  induction.  In  other  words,  the  structure  of  the  proof  reflects  the  source  language 
syntactic  structure.  From  carrying  out  a number  of  program  proofs  (including  that  of  MCO) 
we  believe  that  a program  proof  is  more  easily  carried  out  if  the  structure  of  the  program’s 
code  parallels  the  structure  of  the  proof.  This  is  one  reason  we  believe  that  a program  and  its 
proof  should  be  constructed  together.  Then  that  parallel  between  code  and  proof  structure 
can  be  accomplished  without  the  need  to  force  the  structure  of  the  proof,  which  so  often 
happens  when  a proof  is  attempted  on  an  existing  program. 

If  compiler  code  is  made  to  reflect  the  syntactic  structure  of  the  source  language,  it  will 
then  parallel  the  structure  of  the  proof,  and  most  likely  ease  that  proof.  Thus  the  compiler 
should  be  built  to  reflect  the  source  language  structure,  such  as  MCO  does  as  a case  statement, 
each  case  compiling  a syntactic  type,  or  such  as  the  PL/0  compiler  does  in  having  a procedure 
to  compile  each  major  syntactic  type. 
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7.1  Introduction 

The  work  presented  here  has  enabled  a very  simple  compiler  (the  McCarthy-Painter 
one)  to  be  proved  almost  completely  interactively,  while  only  the  part  one  proof  of  the  more 
complex  MCO  was  done  with  machine  assistance.  One  case  of  a compiler  with  a different 
organizational  structure  (the  PL/0  compiler)  was  also  proved  with  the  same  methods  (though 
proved  without  machine  assistance,  for  the  reasons  given  before  the  proof).  The  motivation 
for  proving  compilers  given  in  Chapter  2 combined  with  the  success  we  have  had  with  the 
methods  of  compiler  proof  presented  here  lead  us  to  recommend  further  research  in  applying 
these  methods.  We  feel  the  most  important  step  toward  applying  them  is  to  construct  and  use 
a program  proving  system  with  the  facilities  required  for  an  automated  part  two  proof  of  a 
more  complex  compiler.  1 he  other  research  which  we  will  recommend  would  be  difficult 
without  it.  Such  a system  is  described  in  the  next  section. 

Next  we  would  recommend  research  toward  applying  these  methods  to  the  proof  of 
optimizing  compilers.  This  is  motivated  by  the  fact  that  nearly  all  working  compilers  produce 
optimizations  in  their  target  code  of  kinds  not  found  in  MCO.  Therefore  we  Include  in  this 
chapter  a section  on  what  directions  such  research  should  take. 

Also  included  in  this  chapter  are  sections  about  further  theorem  proving  developments 
that  would  aid  compiler  proofs,  and  about  some  ideas  that  may  simplify  the  part  one  compiler 
proofs. 


7.2  Automating  the  Part  Two  Proof 

T he  two  parts  of  the  compiler  proof  are:  part  one  to  prove  what  the  compiler  produces, 
and  part  two  to  prove  that  the  target  language  produced  has  the  same  effect  as  the  source 
language.  Part  two  is  carried  out  at  a meta  level  above  that  of  ordinary  verification  condition 
generators.  Some  of  the  target  code  produced  by  the  compiler  is  expressed  as  a function  of  the 
source  code  being  compiled  and  the  state  of  the  compiler  (symbol  table,  etc.).  Existing 
verification  condition  generators  operate  on  actual  code,  not  expressions  representing  code. 
This,  and  the  lack  of  other  necessary  features  described  below,  prevented  us  from  carrying  out 
the  part  two  proof  of  MCO  with  machine  assistance.  Some  of  the  features  were  actually  added 
to  the  Xivus  system  in  order  to  carry  out  the  interactive  proof  of  the  McCarthy-PaInter 
compiler. 

We  believe  a meta-level  verification  condition  generator  could  be  constructed  which 
would  allow  the  part  two  proof  to  be  done  Interactively.  The  major  requirements  of  such  a 
system  would  be: 

I.  'I  he  verification  condition  generator  must  indicate  substitutions  symbolically  rather  than 
actually  doing  them. 
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2.  1 he  verification  condition  generator  should  be  able  to  generate  from  an  assertion 
somewhat  after  a label  back  to  the  label  to  establish  what  the  assertion  Is  at  that  label 
(at  least  where  this  is  theoretically  possible)  to  avoid  having  the  user  do  the  same  by 
hand. 

3.  I he  system  must  keep  and  use  a great  deal  of  type  information  about  variables, 
functions,  and  expressions. 

4.  1 he  system  should  allow  the  language  being  processed  and  its  Hoare  rules  to  be  user 
specified. 

5.  T he  system  should  gather  the  results  (target  code  produced)  of  the  part  one  proof,  the 
statement  of  correctness  of  the  compiler,  and  the  syntactic  case  designations  into  a 
pr  ocedure  whose  proof  constitutes  the  part  two  proof  of  the  compiler. 

6.  I'he  verification  condition  generator  should  apply  some  simplification  as  it  operates. 

7.  1 he  system  must  print  substitutions  and  the  not-in  (-c)  property  (see  Section  3.6)  in  an 
easily  read  form. 

As  mentioned  above,  some  of  the  target  code  produced  by  the  compiler  is  expressed  in 
terms  of  the  source  code  being  compiled  and  the  state  of  the  compiler  (symbol  table,  etc.), 
which  means  that  we  cannot  use  ordinary  verification  condition  generators  for  the  part  two 
proof.  The  solution  to  this  problem  is  to  have  the  verification  condition  generator  indicate 
substitutions  symbolically  rather  than  doing  the  substitutions. 

1 he  type  information  is  required  in  such  a system  because  the  variables,  functions,  and 
even  constants  must  be  treated  differently  depending  on  whether  they  are  source  language, 
target  language,  compiler  code,  or  were  introduced  by  the  verification  condition  generator 
Itself.  For  example  the  target  code  produced  by  MCO  for  the  case  of  a function  call  is: 

< ! FCOMPLIS(CDR(EXP),M,LOCTABLE) 

! FI0ADAC(1-LENGTH(CDR(EXP)),  I) 

< ’SUB  ’P  < 'COO  LENGTH (CDR (EXP))  LENGTH(CDR(EXP) ) > > 

< ’CALL  LENGTH(CDR(EXP))  < ’E  CAR(EXP)  > > 

> 

T he  functions  CDR,  LENGTH,  and  subtract  (-)  were  all  actually  called  at  compile  time.  That 
is,  they  arc  simply  expressions  from  the  compiler  code  standing  for  some  part  of  the  target 
code  that  was  produced  by  the  compiler.  FCOMPLIS  and  FLOADAC  are  functions  used 
only  In  the  assertions  Introduced  Into  the  compiler  for  the  part  one  proof,  but  similarly  stand 
for  pieces  of  target  code  produced  by  the  compiler.  Normally  during  verification  condition 
generation  we  must  add  into  the  verification  condition  some  form  of  the  Entry  assertion  of 
each  function  call  we  encounter  In  the  code  being  processed.  Similarly  we  may  use  the  Exit 
assertion  of  such  a function  as  a hypothesis  in  the  verification  condition.  But  that  is  only  true 
of  functions  actually  executed  in  the  code  being  processed  by  the  verification  condition 
generator.  1 herefore  our  meta-level  generator  must  distinguish  compiler  functions  from  target 
language  functions  (by  use  of  type  information)  to  properly  treat  them. 

Another  place  where  we  treat  compiler  functions  differently  is  in  the  application  of 
subrulcs  6 and  13.  T hey  may  only  be  applied  to  source  language  functions.  Again  we  must 
keep  track  of  type  In  our  part  two  proving  system.  It  might  be  noted  here  that  compiled 
functions  in  target  code  have  the  same  name  as  the  source  language  function  that  was 
compiled.  Not  having  to  translate  function  names  from  source  to  target  language  is  Inherent 
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ill  oiii  ronipilri  pmof  mrthod.  To  require  such  translation  would  unduly  complicate  compiler 
(Hoofs  llnwcvcr,  in  our  proving  system  we  will  keep  carefully  separated  the  properties  of 
siumi’  lan('.u.if;e  and  tarpet  language  entities,  and  the  context  in  verification  conditions  should 
allow  us  to  disiinpuish  source  from  target  language  when  necessary.  There  is  nothing  to 
(in'vriit  the  com(iilei  function  names  from  overlapping  source  language  function  names  either. 
'I  fins  OIII  system  must  hr  prejiared  to  accommodate  this  complication,  perhaps  by  the  standard 
tiick  of  adding  rxiensions  to  function  names  where  necessary  to  distinguish  the  type. 

Aiiniher  (ilace  where  we  must  distinguish  type  is  in  separating  the  target  language 
vaiialilcs  from  tlie  source  language  variables.  1 hen  a simple  type  check  will  suffice  to  apply 
(at  least  to  variables)  stibiules  la,  lb,  Ic,  2a,  2b,  and  2c.  1 hese  rules  are  simply  statements  of 
the  fart  that  soiiice  language  variables  and  target  language  variables  are  different.  If  we 
fiiiihei  distinguish  the  slack  (m),  the  stack  pointer  (R),  and  the  registers  (Rl)  as  different  types 
within  the  target  language,  subrules  2d,  2e,  and  2f  become  type  checks  Obviously  this  type 
infotiiiation  will  be  of  value  to  the  theorem  prover  in  applying  the  simplification  rules 
(iiirliicling  some  of  the  subrules),  as  well  as  of  value  to  the  verification  condition  generator 

We  may  get  more  benefit  out  of  this  type  system  by  being  able  to  determine  types  of 
ex|iirssions  as  well  as  of  variables  l or  instance,  getv(MAP)  in  the  proof  of  the  McCarthy- 
I’amiei  conquici  is  used  to  extract  a list  of  declared  source  language  variables  from  the  symbol 
table  MAP.  lienee  getv  requires  a type  compiler-variable-standing-for-symbol-table  as  its 
argument  and  pioduces  a type  source  language-variables-expression  as  Its  result.  Using  this 
type  information  would  immediately  tell  us  that  items  of  certain  other  types  are  not  contained 
in  ( -c)  a gctv(MAF*),  ratlier  than  going  through  an  arduous  application  of  subrules  to 
establish  this  fact  1 yping  of  expressions  will  also  allow  us  to  apply  all  parts  of  subrule  1 to 
exjnessions  as  well  as  variables  by  means  of  a type  check. 

A (noblem  that  arises  with  typing  expressions  is  that  we  may  get  a mixed  type.  For 
instance,  if  we  have  the  expression  m[P]  we  have  both  the  stack  type  and  the  stack  pointer 
type  So  our  system  nerds  to  be  able  to  keep  track  of  mixed  types  such  as  the  union  of  target- 
stack  type  and  target- pointer  type.  Such  unions  of  types  have  been  treated  In  language  design 
before,  so  they  should  present  no  major  problems. 

1 his  handling  of  mixed  types  would  bring  us  more  flexibility  in  dealing  with  another 
problem.  1 hat  problem  is  that  in  many  cases  during  the  part  two  proof  we  have  expressions 
re|jrcsenting  mixtures  of  target  and  source  language  terms.  For  example,  after  applying  the  c- 
k substitution  in  the  McCarthy-Painter  proof,  we  have  translated  source  language  constants  to 
target  language,  but  the  expression  still  contains  source  language  variables.  A union  of  the 
types  source- variable  and  target-constant  would  allow  us  to  type  this  expression. 

'I  he  type  mechanism  should  also  distinguish  between  items  that  will  be  atomic  variable 
names  and  those  that  will  be  expressions  in  the  source  or  target  language  assertions.  For 
example,  in  the  MCO  proof,  the  variables  EXP  and  M are  both  compiler  variables  that 
appear  in  the  expressions  representing  target  code  that  will  be  processed  by  the  verification 
condition  generator.  However,  EXP  will  be  a source  language  expression,  while  M will  always 
be  an  integer.  1 herefore  we  can  say  that  M will  never  contain  any  source  language  variables, 
but  EXP  piobaoly  will.  That  fact  affects  the  way  In  which  subrules  may  be  applied  to  either 
EXP  or  M. 

1 he  theorem  prover  of  course  must  be  able  to  access  this  type  Information  in  defining 
and  applying  the  rewrite  rules  and  axioms.  For  Instance,  the  system  user  must  be  able  to 
require  when  defining  the  subrules  that  those  items  that  we  have  designated  by  the  letter  D in 
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the  subrule  li.<;t  mu'-  ’tc  atomic  identifiers,  not  expressions  (in  target  or  source  language,  that 
is;  they  could  perfectly  well  be  expressions  from  the  compiler  code  that  always  represent  a 
target  or  source  variable).  Since  we  have  not  defined  substitution  for  an  expression,  nor  are 
the  subrules  valid  if  applied  to  such,  any  system  that  did  not  check  that  substitutions  were 
made  only  for  identifiers  would  allow  us  to  construct  an  erroneous  proof. 

1 he  requirement  of  user  definition  of  language  and  Hoare  rules  comes  about  for  several 
rca.sons.  1 he  language  to  be  processed  through  the  verification  condition  generator  for  a part 
two  proof  is  of  t lursc  the  target  language  of  the  compiler.  Because  of  the  great  variety  of 
target  languages,  one  would  like  to  be  able  to  change  the  language.  This  could  be  done  by 
using  the  techniques  of  parser  generators,  in  which  a description  of  the  language  to  be  parsed 
is  entered  and  a parser  for  that  language  is  produced.  A more  pressing  reason  for  desiring 
this  capability  is  that  the  language  which  the  system  must  process  is  not  simply  the  target 
language.  It  is  the  target  language  plus  the  symbolic  expressions  from  the  compiler 
representing  target  language.  Therefore  the  language  actually  depends  on  the  way  the 
compiler  is  written. 

1 he  user  must  be  able  to  specify  the  Hoare  rules  that  are  used  by  the  verification 
condition  generator  to  operate  on  the  target  language.  This  is  required  because  some  of  the 
I loare  rules  depend  on  the  way  the  compiler  is  written,  not  just  on  the  target  language.  The 
function  interface,  that  is,  where  arguments  are  passed,  where  the  result  Is  returned,  and  how 
execution  control  returns  to  the  calling  procedure,  must  show  up  in  both  the  Hoare  rule  for 
the  call  statement  and  the  Hoare  formula  for  the  statement  of  correctness  (which  is  used  as  a 
I loare  proof  rule  during  processing  of  the  result  of  a recursive  call  to  the  compiling  function). 

f or  the.se  reasons  we  believe  that  the  system  to  automate  part  two  proofs  will  have  to 
allow  the  user  to  specify  each  Hoare  rule  and  the  Hoare  formula  for  the  statement  of 
correctness  of  the  compiler.  The  verification  condition  generator  must  then  work  directly  from 
these  Hoare  rules  rather  than  having  the  Hoare  rules  b'Jilt  into  its  code.  Otherwise  a new 
verification  condition  generator  would  have  to  be  written  for  each  new  compiler,  or  even  for  a 
change  in  the  statement  of  what  is  to  be  proved  about  the  compiler. 

'Mir  part  two  proof  consists  of  generating  verification  conditions  from  the  right  hand 
side  of  the  I loare  formula  that  is  the  statement  of  compiler  correctness  back  (assuming  a 
backwards  generator  such  as  was  used  in  this  research)  through  the  target  code  produced  for 
each  case,  and  the  proof  of  each  resulting  verification  condition  7 he  system  for  automating 
the  part  two  proof  should  set  up  a procedure  containing  these  pieces  such  that  proof  of  the 
procedure  constituted  a part  two  proof.  The  right  hand  side  of  the  Hoare  formula  that  Is  the 
statement  of  correctness  becomes  the  Exit  assertion.  1 he  left-hand  side  becomes  the  Entry 
a.ssertion  1 he  code  is  then  a case  (or  nested  conditional)  statement  where  the  conditional  test 
for  each  case  is  the  predicate  denoting  the  various  source  language  syntactic  cases  (ISSUMt 
etc ),  and  the  code  executed  for  each  case  is  the  result  of  the  part  one  proof.  The  verification 
condition  generation  then  produces  one  (or  possibly  more)  verification  condition  to  represent 
each  of  the  source  language  structural  types  required  to  make  this  stri  ctural  induction  proof. 

A problem  arises  when  two  different  representations  exist  I ’•  the  same  code.  An 
example  of  this  occurs  in  the  MCO  proof  in  the  AND  case  (subcase  with  arguments).  There 
we  have  one  repre.^entation  which  shows  the  target  code  in  recursive  terms,  that  Is.  as 
FCOMI’EXP  of  the  first  argument  and  of  the  remaining  arguments.  This  form  readily 
proves  the  part  two  proof  for  this  subcase  except  for  the  problem  of  containing  a goto  to  the 
label  1,1,  which  is  buried  inside  the  FCOMPEXP  of  the  remaining  arguments.  The  second 
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Kpirsmliirioii  of  exactly  the  same  code  shows  in  detail  where  the  labels  lie.  It  is  then  possible 
to  mil  the  veiifiration  rnndition  condition  generator  back  through  the  latter  representation  to 
e.siahli.sli  the  asseitions  at  the  labels,  discard  the  verification  coridition(s)  produced,  then  run 
file  geiirialoi  tliroiigh  the  other  representation  of  the  same  code  to  actually  produce  the 
desiieil  veiifiration  condition.  The  ability  to  accept  two  different  representations  of  the  same 
ciKle  i.s  one  which  could  cither  be  built  Into  the  verification  condition  generator  or  could  be 
handled  diiniig  the  pait  two  set  up. 

A not  her  problem  that  ari.scs  in  setting  up  all  syntactic  cases  of  the  part  two  proof  as  a 
single  procedure  is  that  a label  can  appear  In  two  different  places.  An  example  of  this 
appeals  in  the  Mf'.O  proof  in  the  AND  case.  The  same  label  LI  is  used  In  both  the  subcase 
with  no  alignments  and  the  subcase  with  arguments.  1 his  is  to  be  expected,  since  the  code 
piodiiced  is  the  same  for  the  two  subcases  except  for  additional  code  for  additional  arguments 
Till  .solution  heie  is  to  relabel  each  subcase  with  unique  sets  of  labels  during  part  two  set  up. 

I he  pait  two  set  up  will  also  have  to  accept  Moare  formulas  defining  the  action  of 
reiiain  other  compiler  procedures  in  addition  to  the  mam  compiling  procedure.  In  the  MCO 
(•loof  these  other  procedures  were  C.OMPLIS,  LOADAC,  and  MKPUSH.  The  reason  for 
the  special  treatment  of  these  procedures  is  that  the  target  code  they  produced  depended  on 
the  number  of  arguments  allowed  for  some  source  functions  (AND,  etc.).  In  other  words, 
proof  of  the  code  from  these  functions  involved  induction  on  the  number  of  source  language 
aiguinents,  which  is  a different  induction  than  the  structural  Induction  on  nested  source 
language  functions,  floare  formulas  were  therefore  written  to  express  the  effects  of  the 
pioredures  COMPI.IS,  LOADAC,  and  MKPUSN  in  the  proof  of  MCO.  Whenever  a piece 
of  target  code  was  encountered  during  verification  condition  generation  that  was  expressed  as 
the  result  of  COMPI.IS,  LOADAC,  or  MKPUSH,  the  appropriate  Hoare  formula  was 
applied  as  a floare  proof  rule.  1 his  required,  however,  that  these  Hoare  formulas  be  proved 
ctu  rrct  I he  same  method  was  used  as  for  the  statement  of  correctness  of  the  compiler;  that  is. 
verification  conditions  were  generated  from  the  right-hand  side  of  a recursive  form  of  each 
floare  formula  (for  COMPI.IS,  LOADAC,  and  MKPUSfl)  back  through  the  target  code  of 
the  respective  prtKedure,  and  then  the  verification  conditions  were  proved 

llecause  expressions  tend  to  grow  faster  without  being  able  to  carry  out  substitutions,  the 
verification  condition  generator  would  have  to  simplify  as  it  generated  to  keep  expressions  to 
manageable  sire.  1 his  same  complexity  of  the  verification  conditions  is  what  requires  the 
system  to  have  a gtxxJ  system  of  printing  verification  conditions  for  viewing  by  the  interactive 
user.  Because  of  the  extreme  frequency  with  which  substitutions  and  not-ins  (-c)  appear  in 
the  verification  conditions,  special  attention  must  be  paid  to  printing  them  out  in  a readable 
manner  if  the  user  is  expected  to  interact  during  the  proof  in  an  intelligent  manner. 

1 he  register  notation  used  in  the  hand  part  two  proof  of  MCO  should  be  changed  if  it 
were  done  by  machine.  1 he  problem  with  it  arises  when  applying  the  statement  of  correctness 
as  a floare  proot  rule  during  verification  condition  generation.  Then  it  is  required  to 
quantify  an  undetermined  number  (depending  on  what  is  in  the  syntactic  types  contained  In 
arguments)  of  registers  Actually  all  registers  (except  Rl)  could  be  quantified,  since  the  target 
code  never  depends  on  any  higher  registers  to  be  saved  So  in  a machine  proof  we  would 
refer  to  all  registers  (except  Rl)  collectively  by  use  of  an  R array,  and  Just  quantify  the  entire 
array. 

Another  problem  area  Is  that  of  applying  the  subrulcs  6 and  IS  In  a machine  proof. 
Careful  examination  "of  them  shows  that  they  contain  quantification  of  function  names  This 
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at  first  appears  to  require  a higher-order  logic  in  the  theorem  prover.  However,  it  would 
always  instantiate  the  function  to  some  given  function  in  the  theorem  at  the  time  of  applying 
the  subrule.  Thus  no  siich  quantification  would  ever  enter  the  theorems  being  proved.  The 
provision  must  be  made  to  quantify  functions  in  the  subrules  when  they  are  introduced  and 
when  they  are  applied;  otherwise  subrules  6 and  13  would  have  to  be  entered  into  the  system 
again  every  time  they  were  to  be  applied  to  a different  function. 


7.3  Proving  Optimizing  Compilers 

In  Section  3.12  it  was  pointed  out  that  the  methods  used  here  would  not  work  for 
optimizations  in  the  target  code  which  depend  on  the  context  in  which  a particular  source 
syntactic  type  is  found.  One  approach  to  working  with  such  optimizations  would  be  to  assign 
a name  and  predicate  to  each  such  context.  For  example,  in  CO,  the  compiler  on  which  MCO 
is  ba.sed,  boolean  syntactic  types  (AND,  OR,  NOT)  that  lie  immediately  inside  another 
boolean  type  are  compiled  into  shorter  code  than  normal.  We  could  name  this  context  Inbool, 
and  the  predicate  which  tells  us  if  Inbool  exists  at  a given  point  could  be  called  Isinbool. 
Isinbool  would  have  to  be  a function  of  the  program  and  where  within  that  program  we  were 
discussing.  \ 

1 hen  in  the  part  one  proof  we  would  specify  what  the  target  code  would  be  for 
compiling  the  syntactic  type  in  question  for  both  the  case  where  the  context  predicate  is  true 
and  where  the  context  predicate  is  false.  The  statement  of  correctness  would  have  to  be 
prefixed  by  the  hypothesis  that  the  context  predicate  is  false,  and  another  statement  of 
correctness  would  be  added  that  is  prefixed  by  the  context  predicate  being  true.  The 
additional  statement  of  correctness  portion  must  then  state  exactly  the  conditions  (in  the  form 
of  a Hoare  formula  In  target  language  terms)  that  hold  upon  entry  and  exit  of  the  optimized 
target  code. 

If  then  becomes  necessary  during  the  part  one  proof  to  establish  whether  the  context 
predicate  holds  or  not  at  each  point  that  the  compiler  recurses.  1 his  information  allows  us  to 
know  which  part  of  the  statement  of  correctness  (the  context  predicate  true  or  false  part)  is  to 
be  assumed  as  the  inductive  assumption  at  each  point  in  the  part  two  proof  at  which  we 
encounter  a recursive  call  to  the  compiler.  The  value  of  the  context  predicate  could  actually 
be  passed  as  an  argument  to  the  compiler  at  each  call.  However,  compilers  often  do  not 
lecurse  and  pass  in  the  information  that  an  optimization  is  to  be  made,  but  instead  call  a 
different  point  in  the  compiler  that  does  the  optimization.  In  that  case  the  part  one  proof 
mii.st  show  that  what  re.sults  is  the  same  as  would  occur  if  the  compiler  rccursed  with  the 
information  that  the  optimization  was  to  take  place.  It  would  of  course  be  easier  to  prove  the 
compiler  if  the  recursion  actually  took  place  and  the  optimization  information  were  passed 
straightforwardly.  Perhaps  it  would  also  add  understandability  to  compilers  If  they  were 
written  that  way. 

Another  approach  would  be  to  prove  semantic  equivalence  of  the  optimized  form  In  Its 
special  context  to  the  unoptimized  form  without  benefit  of  the  context.  This  would  allow  us  to 
build  on  the  proof  we  have  already  done  rather  than  doing  it  all  again  with  a few  aspects 
changed.  An  example  of  an  equivalence  proof  between  code  produced  by  an  optimized 
version  and  an  unoptimized  version  of  a compiler  is  included  in  Section  A.9.  It  serves  as  an 
example  that  these  techniques  can  be  extended  to  prove  optimizing  compilers. 
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7.4  More  Powerful  I'hrorein  Prover 

Much  of  the  theorem  prover  work  in  the  proof  of  MCO  involved  proof  by  contradiction 
in  the  hypotheses  of  a theorem  (verification  condition).  1'he  same  contradictions  kept 
appearing;,  which  meant  that  execution  of  the  compiler  could  not  have  followed  that  execution 
path  for  the  same  reason  it  could  not  have  followed  similar  paths.  A command  to  "prove  this 
theorem  like  theorem  X"  would  have  saved  a great  amount  of  time.  This  could  probably  be 
accomplished  if  a history  of  commands  entered  to  the  theorem  prover  were  kept  and 
interrogated  when  asked  to  repeat  proofs,  it  could  even  be  done  without  the  human  asking  if 
a search  wcic  made  on  new  theorems  to  see  if  they  had  the  same  or  stronger  hypotheses  and 
the  same  or  weaker  conclusions  than  those  that  were  actually  used  in  previous  proofs. 

The  interactive  part  of  the  proof  of  MCO  was  not  done  without  errors.  Several  times 
proofs  had  to  be  rcilone  with  an  error  corrected  in  either  the  program  or  assertions.  The 
Xivus  system  recognucs  when  a verification  condition  is  regenerated  exactly  as  before,  and  if 
it  has  been  proved  before,  the  user  is  not  asked  to  reprove  it.  However,  many  times  the 
verification  condition  is  changed  in  parts  that  were  not  even  used  in  the  proof,  such  as  a 
conclusion  In  a proof  by  contradiction  in  the  hypotheses.  Other  times  the  new  version  was 
weaker  than  the  old,  and  so  the  same  proof  applied.  A powerful  redo  feature  would  help 
greatly  in  the  correction  process,  and  could  also  be  used  to  prove  different  versions  of  a 
compiler,  such  as  optimizing  versions.  1 hen  only  the  portions  of  the  compiler  producing  the 
different  or  optimized  code  would  need  to  be  proved. 

Another  area  where  machine  help  would  be  welcome  is  in  suggesting  axioms,  rewrite 
rules,  or  equality  substitutions  to  apply.  Q^uite  often  a simple  pattern  match  would  find  the 
axiom  or  rewrite  rule  that  bridges  the  gap  between  what  we  have  (hypotheses)  and  what  we 
need  (conclusions).  Particularly  when  the  expression  of  the  theorem  is  quite  large,  such  a 
facility  would  save  the  human  much  time  in  looking  through  it.  Many  of  the  theorems  in  the 
MCO  proof  were  of  the  form  hypotheses  imply  expression!  •=  expression?.  Pattern  matching 
techniques  to  find  the  similarities  in  the  two  expressions  and  then  pinpoint  the  differences 
could  easily  produce  good  suggestions  as  to  what  axioms,  rewrite  rules,  or  substitutions  to 
apply 


7.5  Part  One  Proof  Reduction 

1 he  slating  of  assertions  for  the  part  one  proof  of  MCO  was,  for  the  most  part,  tedious 
and  without  the  need  for  insight.  Once  a good  notation  was  used,  the  job  became  an  easy  one 
of  writing  what  target  language  the  compiler  produced  for  each  source  language  syntactic  case. 
The  only  difficult  assertions  were  the  ones  involving  functions  with  a variable  number  of 
arguments  (AND,  OR,  and  COND),  since  those  assertions  had  to  reflect  the  induction  on  the 
number  of  arguments.  Many  of  the  simpler  assertions  were  written  by  putting  a sample  of  a 
given  syntactic  type  into  the  compiler  and  examining  the  compiled  result.  Such  a compiled 
result  or  a simple  generalization  of  it  was  invariably  the  correct  assertion. 

1 his  suggested  a method  of  having  the  computer  write  most  of  the  part  one  assertions, 
thereby  greatly  reducing  the  size  of  the  Job  done  by  the  human  The  Idea  is  to  symbolically 
execute  the  eexie  of  the  compiler  on  a symbolically  expressed  source  syntactic  type  I'he  idea 
of  a symbolic  executer  is  not  new;  King  [King75,King76]  and  others  have  applied  it  with 
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e.c^cntially  the  features  described  below.  Such  a symbolic  executer  would  have  to  be  able  to 
backtrack  down  each  possible  execution  path.  Of  course  it  would  have  to  do  theorem  proving 
at  conditional  statements  to  determine  if  the  true  condition  or  false  condition  branches  would 
always  or  never  be  taken.  It  would  have  to  stop  itself  from  descending  too  far  on  a series  of 
recursive  calls.  Perhaps  this  could  be  under  user  control  since  the  user  would  know  where  he 
was  breaking  the  recursion  in  the  proof  for  which  he  was  trying  to  generate  assertions. 

Another  idea  which  might  entirely  eliminate  the  part  one  proof  is  to  write  the  compiler 
in  a pattern  matching  language.  Each  source  language  syntactic  type  would  be  represented  by 
a pattern,  and  the  output  of  the  compiler  could  be  specified  as  a construction  made  from  the 
parts  that  matched  the  pieces  of  the  pattern.  A simple  example  of  a function  written  in  such  a 
manner  is  found  in  Hoare’s  paper  on  recursive  data  structures  [Hoare75,  p.  110].  Such  a 
compiler  would  look  very  much  like  a case  statement  (or  the  equivalent  nested  If-elseif 
statement)  with  each  case  being  a rewrite  rule,  that  is,  a pattern  to  be  matched  followed  by  the 
construction  to  be  output.  Except  for  handling  such  matters  as  the  symbol  table  or  variable 
numbers  of  arguments,  the  compiler  written  in  such  a language  would  appear  almost  exactly 
like  the  assertions  that  had  to  be  proved  in  the  part  one  proof. 
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APPENDIX 


A. I Dcriiiitioiis  of  F Functions 

1 tie  F function  FCOMPEXP  was  for  every  source  language  syntactic  case  or  subcase 
defined  by  an  assumption  of  the  form: 

< ! OUIFII.E’ 

! FCOMPEXP (EXP. M.LOCTABLE) 

> 

[The  expression  found  in  figure  4-3  detailing  the  target  language 
instructions  to  be  found  in  OUTFILE  for  this  case  or  subcase] 

For  example,  for  the  case  of  syntactic  type  AND,  subcase  more  than  lero  arguments,  we 
will  assume  the  following  formula; 

< ! OUTFILE’ 

? FCOMPEXP (EXP. M.LOCTABLE) 

> 


< \ OUTFILE’ 

! FCOMPANDOR(CDR(EXP) . M, LI  .FALSE, LOCTABLE) 

< ’MOVEI  1 < ’QUOTE  'T  > > 

< ’JRST  0 IP  > 

LI 

< ’MOVEI  I 0 > 

LP 


Assuming  this  subgoal  in  the  theorem  prover  defines  FCOMPEXP  for  this  subcase  as; 

< ! FCOMPANDOR(CDR(EXP),M, LI, FALSE, LOCTABLE) 

< ’MOVE!  I < ’QUOTE  ’T  > > 

< ’JRST  0 LP  > 

LI 

< ’MOVEI  I 0 > 

IP 


> 
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I'lic  other  F functions  are  given  here  with  their  defining  assumptions  in  order  of  source 
language  syntactic  type  case. 

function  definition  case; 

< ! OUTFILE’ 

! FMKPUSIKN.  M) 

> 

- our FI IE’ 

where  N < M. 

< ! OUIFILE’ 

? FMKPUSH(N.  M) 

> 


< ! OUTFILE’ 

< ’PUSH  ’P  M > 
f FMKPUSIKN.  M+l) 


where  N 2;  M. 


AND  (no  arguments)  subcase: 

< ! OUTFILE’ 

? FCOMPANDOR(U,  M.  L,  FLC,  LOCTABLE) 

> 

- OUTFILE’ 

AND  (n  > 0 arguments)  subcase  (FLC  is  FALSE): 

< ? OUTFILE’ 

! FCOMPANDOR(U,  M.  L.  FLC,  LOCTABLE) 


< ! OUTFILE’ 

! FCOMPEXP(CAR(U),  M.  LOCTABLE) 

< ’JUMPE  1 L > 

! FCOMPANDOR(CDR(U),  M,  L.  FLG,  LOCTABLE) 


OR  (no  arguments)  subcase: 
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The  same  form  applies  is  this  subcase  as  did  for  AND  (no  arguments). 
OR  (n  > 0 arguments)  subcase  (FLG  is  TRUE); 

< ! OUTFILE* 

! FCOMPANDOR(U.  M,  L.  FLG,  LOCTABLE) 

> 


< • OUIKILE’ 

! FCOMPEXP(CAR(U).  M.  LOCTABLE) 

< ‘JUMPN  I L > 

• FCOMPANDOR(CDR(U).  M,  L,  FLG,  LOCTABLE) 


COND  (no  arguments)  subcase: 

< ! OUIKILE’ 

! FCOMCOND(U,  M,  L,  LOCTABLE) 

> 


< ? OUTFILE’ 
L 


COND  (n  > 0 arguments)  subcase: 

< ? OUTFILE’ 

• FCOMCOND(U,  M,  L,  LOCTABLE) 


< ! OUTFILE’ 

! KCOMPEXP(CAAR(U),  M,  LOCTABLE) 

< ’JUMPE  I L3  > 

! FCOMPEXP(CADAR(U),  M,  LOCTABLE) 

< ’JRST  L > 

13 

• FCOMCOND(CDR(U),  M,  L,  LOCTABLE) 


function  call  case; 
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< ! OUTFILE’ 

! FCOMPLIS(U,  M.  LOCTABLE) 

> 

= OUTFILE’ 

where  U is  a null  argument  list. 

< ! OUTFILE’ 

! FCOMPLIS(U.  M.  LOCTABLE) 


< ! OUI  FILE’ 

! Fr.OMPEXP(CAR(U).  M,  LOCTABLE) 

< ’PUSH  ’P  1 > 

? FCOMPLIS<CDR(U).  M-l.  LOCTABLE) 


where  U is  a non-null  argument  list. 

< ! OUTFILE’ 

! FLOADAC(N,  K) 

> 

- OUTFILE’ 
where  N > 0. 

< ! OUTFILE’ 

? FLOADAC(N.  K) 

> 


< ! OUTFILE’ 

< ’MOVE  K N ’P  > 

! FL0ADAC(N4l,  K4I) 

> 

where  N s 0. 
lambda  case; 

The  same  forms  apply  for  FCOMPLIS  in  this  case  as  did  for  the  funaion  call  case. 
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A. 2 Axioms  Describing  Source  Language  Syntax 

1 hcsc  axioms  are  given,  along  with  labels  for  use  by  the  theorem  prover,  in  the  order  of 
the  syntactic  cases.  A dot  after  a variable  name  indicates  universal  quantification  of  that 
variable  over  the  entire  axiom.  ISEXPRESSION  is  a predicate  which  holds  if  and  only  if  its 
argument  is  a legal  source  language  expression.  A description  of  the  axioms  for  the  AND 
syntactic  type  is  given  in  Section  3.N.  The  axioms  for  the  other  syntactic  types  follow  similar 
patterns. 

case  of  a function  definition: 

I.ISFUNDEF3: 

ISFUNCTIONDEF(X.)  ISEXPRESSION(CADDDR(X. ) ) 


ca.sc  of  NIL: 

I.LSNILI: 

1SNIL(X.)  NULL(X.) 


case  of  T: 

1 ISTl: 

IST(X.)  X.  - ’T 

1.1ST2: 

IST(X.)  NOT  NULL(X.) 


ISNUMRF.R  case: 

L.ISNUMBERI: 

ISNUMBERfX.)  NUMBERP(X.) 

1.ISNUMBER2: 

ISNUMBER(X.)  -»  NOT  NULL(X.) 


ISIDENTIFIER  case 

1 ISIDENTIFIERI: 

ISIDENTIFIER(X.)  AT0M(X.) 


LISIDENTIFIER2: 

ISIDENTIFIER(X.) 

->  NOT  NULL(X.) 

A (X.  ’T) 

A NOT  NUMBERP(X.) 


AND  case: 

I ISANDI: 

ISANDfX.)  -»  CAR(X.)  . ’AND 


L1SAND2: 
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ISAND(X.) 

NOT  NULL<X.) 

A <X.  ’T) 

A NOT  NUMBER? (X.) 
A NOT  ATOM(X.) 


1 ISAND3: 

]SEXPRESSION(X.) 

A (CAR(X.)  - ’AND) 

-*  ISI  ISTOFEXP(CDR(X.)) 

LISAND4: 

ISLISTOFEXP(Z.)  -♦  ISAND(CONS(’AND,Z.)) 


OR  case: 

I.ISORI; 

IS0R(X.)  -♦  CAR(X.)  - ’OR 


I.ISOR2: 

1S0R(X.) 

-»  NOT  NULL(X.) 

A (X.  ’T) 

A NOT  NUMBER? (X.) 

A NOT  AT0M(X.) 

A (CAR(X.)  ’AND) 

LISOR3: 

ISEXPRESSION(X.) 
A (CAR(X.)  - ’OR) 

-»  ISl.ISTOFEXP(CDR(X.)) 


LISOR5: 

1SUST0FEXP(Z.)  -*  ISOR(CONS(’OR.Z.)) 

NOT  case: 

I ISNOTI; 

ISN0T(X.)  -♦  CAR(X.)  - ’NOT 
I.ISNOT2: 

ISN0T(X.) 

-*  NOT  NUl.L(X.) 

A (X.  ’T) 

A NOT  NUMBER?(X.) 

A NOT  AT0M(X.) 

A (CAR(X.)  - ’AND) 

A (CAR(X.)  - ’OR) 
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I ISN01  3: 

1SKXPRF.SSI0N(X.) 

A ((.AR(X.)  - ’NOT) 

► ISKXPRKSSION(CADR(X  )) 


CONI)  case; 

I ISC.ONDI: 

l.sr.OND(X  ) -*  CAR(X.)  - ’COND 


1 ISCOND2: 

IS(;ONn(X  . ) 

NOT  NUI.I.(X.) 

A (X  ’T) 

A NOT  NUMRERP(X.) 

A NOT  ATOM(X.) 

A ((;AR(X.)  i'  ’AND) 
A (CAR(X.)  « ’OR) 

A ((;AR(X.)  .«  ’NOT) 


I lSCONr>3; 

ISKXPRESSION(X.) 

A ((:AR(X.)  - ’COND) 

» ISr.ONDl  IST(CDR(X.)) 

I.ISCOND-I; 

ISCONDLIST(X.)  A NOT  NULL(X.)  -♦  ISCONDLIST(CDR(X. )) 
I ISCOND5: 

1SC0NDLIST<X.) 

A NOT  NUl.L(X  .) 

ISEXPRESSION<CAAR(X.)) 

A ISEXPRESSION(CADAR(X.)) 

I 1SCOND6; 

1SC0NDLIST(Z.)  -»  ISC0ND(C0NS(’C0ND.Z.)) 

Q^UOI  E ease: 

1 ISC^UOTEI: 

ISQWTE(X.)  -»  CAR(X.)  - 'QUOTE 
llS<iUOTE2: 


ISQUOTK(X  ) 

-4  NO  I NUI.L(X  .) 

A (X  v T) 

A NO  I NUMBF.RP(X  .) 

A NOT  ATOM(X.) 

A ((.AR(X.)  ’ANh) 

A ((  AR(X.)  ’OR) 

A ((AR(X  ) ’NOT) 

A ((;AR(X  ) .*  ’COND) 


CAsc  of  a (unclifui  call 
I ISI  UNCAI.I.I: 

ISIUNCIIONCAIKX.)  -♦  ArOM(CAR(X.)) 

I I5>MJNCAI  I ? 

ISIUNtnONCAI.KX.) 

* NOT  NUI.I.(X.) 

A (X  ’T) 

A NOT  NUMBERP(X.) 

A NOT  ATOM(X.) 

A ((;AR(X.)  ’AND) 

A ((:AR(X.)  ..  ’OR) 

A ((:AR(X.)  ’NOT) 

A (C.AR(X.)  ..  ’COND) 

A ((;AR(X.)  ..  ’QUOTE) 


I ISFUNCAl  1.3; 

1ST.XPRF.SSI0N(X.) 

A ATOM(CAR(X.)) 

A (CAR(X.)  ’COND) 

ISI  ISTOFEXP(CDR(X.)) 

case  of  a lambda: 

l.ISLAMBDAI: 

ISLAMBDA(X.)  CAAR(X.)  - ’LAMBDA 


1 ISLAMBDA2; 
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ISI.AMRItA(X.) 

NOT  NUl.L(X.) 

A (X.  ^ ’T) 

A NOT  NUMBER? (X.) 

A NOT  ATOM(X.) 

A ((;AR(X.)  *AND) 

A (C:AR(X.)  ’OR) 

A (CAR(X.)  ’NOT) 

A ((;AR(X.)  p'  ’COND) 

A ((  AR(X.)  ’QUOTE) 

A NOT  ATOM(CAR(X.)) 


I.ISL.AMRDA3; 

ISEXPRESSION(X.) 

A (CAAR(X.)  - ’LAMBDA) 
> iSEXPRESSION(CADDAR(X.)) 

nSLAMBDA^ 

ISEXPRF^SION(X.) 

A (CAAR(X.)  = ’LAMBDA) 
-»  1SI,IST0FEXP(CDR(X.)) 


general  argument  list  axioms: 

1 ISLISTOFEXPL 

ISL1ST0FEXP(X.) 

A NOT  NULL(X.) 

-»  ISLISTOFEXP(CDR(X.)) 

L1SLIS10FEXP2; 

ISLISTOFEXP(X.) 

A NOT  NULL(X.) 

->  ISEXPRESSION(CAR(X.)) 


A.3  Rewrite  Rules  Describing  Lisp  Functions 

1'hcsc  rules  arc  given  along  with  labels  for  use  by  the  theorem  prover.  The  dashed 
right  arrow  ( -->  ) indicates  that  the  pattern  to  Its  left  is  replaced  by  the  form  to  the  right. 
Variables  followed  by  dots  are  ones  that  may  be  bound  to  any  expression  to  cause  a match  of 
the  left  side. 

CONS.  APPEND  rules: 

RCONSI: 

APPEND(XX. ,CONS(YY. ,ZZ.)) 

APPEND(RICHTCONS(XX. .YY. ) .ZZ. ) 


- -> 
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RCONS2: 

APPEND(XX. . ’NIL)  — > XX. 

RCONS3R: 

APPEND ( XX . . APPEND ( YY . , ZZ . ) ) 

->  APPEND(APPEND(XX..YY.).ZZ.) 

RCONS4: 

APPEND(XX. .R1GHTC0NS(YY. ,ZZ. )) 

- - > R I GHTCONS(APPEND{XX . . YY . ) .ZZ. ) 

CDR  of  a CONS  rule: 

RCDRCONS: 

CDR (CONS (XX. .YY.))  — > YY. 

CAR- CDR  combination  rules: 

RCAAR: 

CAAR(XX.)  — > CAR(CAR(XX.)) 

RCADR: 

CADR(XX.)  — > CAR(CDR(XX.)) 

RCDDR: 

CDDR(XX.)  — > CDR(CDR(XX.)) 

RCAADR: 

CAADR(XX.)  — > CAR (CAR (CDR (XX.))) 
RCADAR: 

CADAR(XX.)  — > CAR(CDR(CAR(XX.))) 
RCADADR: 

CADADR(XX.)  — > CAR(CDR(CAR(CDR(XX.)))) 


A.4  Basis  for  Part  One  Proofs 

Following  is  a list  of  the  source  language  syntactic  description  axioms  and  the  rewrite 
rules  describing  the  Lisp-like  functions  which  were  required  to  prove  the  part  one  proof  of 
each  syntactic  type  case.  1'he  axioms  have  names  beginning  with  the  letter  L.  and  the  rewrite 
rules  with  R.  The  name  before  the  colon  in  each  list  is  the  procedure  or  function  name  whose 
proof  requires  the  basis  immediately  following  the  colon.  The  full  axioms  and  rewrite  rules 
arc  found  in  Section  A. 2 and  Section  A.S. 


A.4.1  ISNII.  Case 

COMPEXP:  RCONSI,  RCONS2,  LISNILI,  LISCOND3,  LISFUNCALL3. 
1,IS1.AMBDA4.  LISLAMBDA3. 


A.4.2A  1ST  Case 

COMPEXP;  I.IST2.  LISTI.  RCONSI,  RCONS2.  LISCOND3.  LISFUNCALL3. 
I.ISLAMBDA4,  L1SLAMBDA3. 


A.4.2B  ISNIIMBER  Case 

COMPEXP;  1JSNUMBER2.  RCONSI.  RCONS2,  LISNUMBERJ,  L1SCOND3. 
LISEUNCALL3.  LISLAMBDA4.  LISLAMBDA3. 


A.4.3  ISIDENTIFIER  Case 

C.OMPEXP;  LISIDENTIFIER2.  RCONSI,  RCONS2.  LISIDENTIFIERI,  LISCOND3. 
1,ISFUNCALL3.  LISLAMBDA4.  LISLAMBDA3. 


A.4.4-0  ISAND  No  Arguments  Subcase 

COMPEXP;  IISAND2.  RCONSI,  RCONS2.  LISCONDS,  LISANDI, 
LISFUNCAI.L3,  LISLAMBDA4.  LISLAMBDA3. 

COM  BOOL:  LISAND3.  LISANDI,  LISOR3.  LISNOT3. 

COMPANDOR:  LISLISTOFEXPI.  LISL1STOFEXP2. 


A.4.4-N  ISAND  Willi  Arguments  Subcase 

COMPEXP:  LISAND2.  LISANDI.  LISAND3.  LISLISTOFEXPI.  L1SAND4. 
RCONSI.  RCONS2.  RCONS4.  RCDDR,  RCADR,  RCDRCONS,  LISCONDS. 
LISFUNCALL3,  LISLAMBDAl  LISLAMBDA3. 

COM  BOOL:  LISAND3.  LISANDI,  LISORS,  LISNOT3. 

COMPANDOR:  LISLISTOFEXPI.  LISL1STOFEXP2.  RCONSI.  RCONS4. 
RCONS3R. 


A. 4. 5-0  ISOR  No  Arguments  Subcase 

COMPEXP  ilSOR2.  RCONS2.  RCONS2.  LISCONDS.  LISORI,  LISFUNCALL3. 
l,ISLAMBDA4.  LISLAMBDA3. 

COM  BOOL;  LISAND3.  LISOR2.  LISORS,  RCONSI,  RCONS2.  LISNOTS,  LISORI. 
COMPANDOR:  LISLISTOFEXPI.  L1SL1STOFEXP2. 


9G 


A.4.5-N  ISOR  With  Arguments  Subcase 

COMPEXP:  L1SOR2.  LISOR3.  LISLISTOFEXPl.  LISORI,  LISOR5,  RCONSl, 
RCONS2,  RCONS4.  RCDDR,  RCADR.  RCDRCONS,  LISCOND3,  LISFUNCALL3, 
1 ISLAMUDAI.  LISLAMBDA3. 

COM  BOOL:  I.ISAND3.  LISOR2.  LISOR3.  RCONSl,  RCONS2.  RCONS4. 
I.ISNOI  3.  LISORI. 

COMPANDOR:  LISLISTOFEXPl.  LISLISTOFEXP2.  RCONSl.  RCONS4. 
RCONS3R. 


A.4.6  ISNOT  C3^c 

COMPEXP.  LISNOT2.  RCONSl,  RCONS2.  RCADR,  RCONS3R,  LISNOTl. 
I ISCONn3,  LISFUNCALL3.  LISLAMBDA4.  LISLAMBDA3. 

C OM  BOOL:  LISAND3.  LISNOT2.  LISOR3.  LISNOT3,  RCONSl.  RCONS2. 
RCADR.  RCONS3R.  I.ISNOTI. 


A.4.7-0  ISCONU  No  Arguments  Subcase 

C.OMPEXP;  LISCOND2.  L1SCOND3,  LlSCONDl,  LISFUNCALLS,  LISLAMBDA4. 
L1SLAMBDA3. 

C.OMCOND:  RCONSl.  RCONS2,  LISCOND4.  LISCOND5. 


A.4.7-N  ISCOND  With  Arguments  Subcase 

C.OMPEXP;  LISCOND2,  L1SCOND3,  LISCOND4,  LISCOND6,  RCONSl,  RCDDR. 
RCDRCONS.  RCONS3R.  RCAADR,  RCAAR,  RCAADR,  RCADADR,  RCADAR. 
LlSCONDl.  LISFUNCALL3,  LISLAMBDA4,  LISLAMBDA3. 

C.OMCOND:  LISCOND4,  LISCOND5,  RCONSl,  RCONS3R,  RCAAR.  RCADAR. 


A. 4. 8 ISQUOTF.  Case 

COMPEXP;  LlSQ,UOTE2,  L1SCOND3,  RCONSl,  RCONS2,  LISFUNCALL3, 
LISO^UOTEI.  LISLAMBDA4,  LISLAMBDAS. 


A.4.9  ISFUNCTIONCALL  Case 

COMPEXP;  LISFUNCALL2.  LISCOND3,  LISFUNCALLS,  RCONSl,  RCONS2. 
RCONS3R.  LISLAMBDA4,  LISLAMBDAS.  LISFUNCALLl. 


A.4.10  ISLAMBDA  Case 

C'.OMPF.XP:  LISLAMBDA2.  LISCOND3,  LISFUNCALLS.  LISLAMBDA4, 

LISLAMBDAS.  RCONSl.  RCONS2,  RCONS3R.  RCADAR,  LISLAMBDAI. 
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A.4.II  ISFUNCTIONDEF  Case 

COMP:  LISFUNDEF3.  RCONSI,  RCONS2,  RCADR,  RCONS3R. 


A. 5 Sketch  of  Part  One  Proofs 

For  each  source  language  syntactic  type  case  or  subcase  (except  function  definition, 
which  Is  compiled  by  COMP)  the  part  one  proof  consists  of  proving  the  verification 
conditions  for  procedure  COMPEXP.  The  Xivus  system  produces  nine  verification 
conditions  because  there  are  nine  execution  paths  through  COMPEXP.  Eight  of  these  paths 
will  not  be  taken  for  a given  syntactic  type  case.  'I'he  proof  of  each  corresponding  verification 
condition  Is  simply  finding  a contradiction  in  the  hypothesis.  For  Instance,  we  may  have 
hypotheses  of  ISNIL(EXP)  and  NOT  NULL(EXP).  In  light  of  the  syntactic  description 
axiom  LISNILI,  this  is  a contradiction.  Some  of  these  verification  conditions  will,  however, 
have  a conclusion  or  two  which  discharge  the  necessity  of  satisfying  Entry  assertions  of 
... actions  or  procedures  called  along  that  path.  These  conclusions  are  proved  by 
s^'-alghtforward  application  of  equality  substitutions,  rewrite  rules,  and  syntactic  description 
axioms. 

The  remaining  one  verification  condition  of  each  case  or  subcase  consists  of  a 
conclusion  representing  the  FCOMPEXP  assertion,  which  may  be  assumed  In  order  to  define 
FCOMPEXP,  and  one  or  more  conclusions  which  are  usually  proved  by  straightforward 
application  of  equality  substitutions,  rewrite  rules,  and  axioms.  The  proofs  of  subsidiary 
functions  or  procedures  called  by  COMPEXP  turn  out  to  be  quite  similar  to  those  of 
COMPEXP  In  the  ways  mentioned  above.  The  following  is  a sketch  of  how  the  one  type  of 
part  one  proof  that  was  not  straightforward  was  carried  out. 

In  the  AND,  OR,  and  COND  (more  than  zero  arguments)  subcases,  the  following 
technique  was  used.  First  the  conclusions  of  the  verification  condition  resulting  from  the 
longest  of  the  three  assertion  clauses  and  from  the  shortest  are  proved  by  straightforward 
means.  Then  those  conclusions  are  entered  in  slightly  modified  form  Into  the  theorem  prover 
as  an  axiom  to  be  used  to  prove  the  medium  length  conclusion.  For  example.  In  the  AND 
subcase,  the  verification  condition  conclusions  resulting  from  these  assertions  are  proved  first. 

Oin^FILE  = 

< ! OUTFILE’ 

! FCOMPEXP(EXP,M,LOCTABLE) 

> 

ISAND(EXP)  -> 

Oin-FILE  - 

< ! OUTFILE* 

.»  FCOMPANDOR (CDR (EXP) , M, LI , FALSE,  LOCTABLE) 

< ’MOVEI  I < ’(iUOTE  ’T  > > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 


> 


08 


'I  hen  tlie  following  newly  proved  theorem  is  entered  into  the  theorem  prover  as  an 
axiom. 

lSANI»(KXr.)  . 

IT.OMPFXrfF.Xr.  . M , LOOT  ABLE.)  - 
< ? l(;OMI'ANIK)R(CnR(EXP.),M.  .LI. FALSE, LOCTABLE.) 

< ’MOVKI  I < ’QUOTE  'T  > > 

< ’JRST  0 I?.  > 

LI 

< ’MOVE I I 0 > 

1.2 


The  other  of  the  three  conclusions  then  proves  easily  by  Instantiating  the  arguments  of 
I C'.OMPK.XP  in  this  axiom  to  the  arguments  of  the  recursive  use  of  FCOMPEXP  on  the 
AND  form  with  one  less  argument  (it  may  be  recalled  that  the  syntax  and  semantics  of  the 
form  <'AND  a I a2  ...  an>  are  recursively  defined  in  terms  of  the  form  <’AND  a2  ...  an>;  hence 
tlie  use  of  the  AND  with  one  less  argument  that  is  being  proved  in  the  rrtedium  length 
conclusion) 


A.6  Compiler  Listing 

Following  is  a listing  of  the  compiler  proved.  The  assertions  are  in  the  clisp-like 
nofafion  (before  translation  to  the  form  acceptable  to  the  proving  system).  The  assertion  from 
figure  T-8  appropriate  tor  a particular  source  language  syntactic  case  must  be  added  to 
procedure  COMPEXP  in  the  place  marked.  The  assertion  for  the  function  definition  case, 
however,  goes  in  the  place  marked  in  procedure  COMP. 

PROCFnURF  COMP  (FXP  : LIST;  VAR  OUTFILE  : FILE); 

ENTRY  ISFUNCTIONDEF(EXP); 

EXIT  [Use  assertion  for  function  def ini t ion  case.]; 

VAR  N ; INTEGER; 


99 


RFC  IN 

N :=  LFNCTH(CADDR(EXP)); 

OUIFILF.  RICHTC0NS(0UTF1LE, 

CONS(QLAP.CONS(CADR(EXP) .CONS(QSUBR.QNIL) ) ) ) ; 

MKPUSIKN,  I.OUTFILE); 

COMPEX P (CADDDR  ( EXP ) . -N . PRUP (CADDR (EXP) , 1 ) , OUTFI LE) ; 

OUTFILE 

RICHTCONS(OUTFILE, 

CONS(QSUB. 

CONS((iP. 

CONS (CONS (QC, 

CONS (0 , CONS (0 , CONS (N . CONS (N , ()NI L) ) ) ) ) , 
(iNIL)))); 

OUTFILE  ;=  RICHTCONS(OUTFILE.CONS((iPOPJ.CONS(qP,qNIL))) ; 

OUTFILE  RIGHTCONS(Oin'FILE.QNIL) ; 

END, 


PROCEDURE  MKPUSH  (N  . M : INTEGER;  VAR  OUTFILE  : FILE); 

ENTRY  TRUE; 

EXIT  OUTFILE  - < ! OUTFILE’ 

! FMKPUSH(N.M) 

> 

* N < M -»  OUTFILE  - OUTFILE’ 

* N J M -♦  OUTFILE  - < ! OUTFILE’ 

< ’PUSH  ’P  M > 

! FMKPUSH(N,  M+I) 

> : 

BEGIN 
IF  N GE  M 
THEN  BEGIN 

OUTFILE  RIGHTCONS(OUrFILE,CONS(QPUSH,CONS((iP.CONS(M.(iNIL) ) ) ) ; 
MK PUSH (N.M+ I, OUTFILE); 

END; 

END; 

FUNCTION  PRUP  (VARS  : LIST;  N : INTEGER)  : LIST; 

ENTRY  TRUE; 

EXIT  NULL  (VARS)  -♦  PRUP  - ’NIL 

i NOT  NULL(VARS)  -*  PRUP  - < <CAR(VARS)  . N> 

! PRUP (CDR( VARS).  N+I) 

> ; 

BEGIN 

IF  NULL(VARS) 

THEN  PRUP;.QNIL 

ELSE  PRUP ; >CONS  (CONS  (CAR  (VARS) . N) . PRUP (CDR  ( VARS) . Nf  I ) ) ; 

END; 
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FUNCTION  RETR1EVE(EXP  : LIST;  M : INTEGER;  LOCTABLE  : LIST; 

OUTFILE  : FILE)  : INTEGER; 

ENTRY  TRUE; 

EXIT  RETRIEVE(EXP.  M.  LOCTABLE.  OUTFILE) 

= M 4 CDR( ASSOC (EXP.  LOCTABLE))  ; 

BEGIN 

RETRIEVE: -M  4 CDR (ASSOC (EXP. LOCTABLE) ) ; 

END; 

PROCEDURE  COMPEXP  (EXP  : LIST;  M : INTEGER;  LOCTABLE  : LIST; 

VAR  OUTFILE  : FILE); 

ENTRY  ISEXPRESSION(EXP); 

EXIT  OUTFILE  - < ! OUTFILE’ 

* FCOMPEXP(EXP.M. LOCTABLE) 

> 

& [Use  assertion  for  each  syntactic  type  case  here]  ; 

VAR  LI,  L2.  L5  : LIST; 

BEGIN 

IF  NULL(EXP) 

THEN  OUTFILE  RIGinCONS(OUTFILE,CONS((iMOVEI ,C0NS(1  ,CONS(O.QNIL)))) 
ELSE  IF  (EXP  - QT)  OR  NUMBERP(EXP) 

THEN  OUTFILE  :- 

R I GHTCONS (OUTFILE, 

CONS((iMOVEI , 

CONSd, 

CONS  (CONS  ((^UOTE,  CONS  (EXP . (iNI  L) ) , 

QNIL)))) 

ELSE  IF  ATOM(EXP) 

THEN  OUTFILE 

R I GHTCONS (OUTFILE. 

CONS((iMOVE. 

CONSd, 

CONS (RETRI EVE(EXP . M , LOCTABLE , OUTFI LE) , 

CONS (QP. QNIL))))) 
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F.I.SE  IF  (CAR (EXP)  = (^AND)  OR  (CAR(EXP)  - QQR)  OR  (CAR(EXP)  - QNOT) 
THEN  BEGIN 

CENSYM(Ll) ; 

CKNSYM(L2) ; 

COMBOOL ( EXP , M , L I . LOCTABLE , OOTFI LE) ; 
our FILE 

R1GHTC0NS(0UTF1LE. 

CONS(QMOVEI. 

CONSd, 

CONS  (CONS  ((JQUOTE , CONS  ((^T . (^N I L) ) , 

QNIL)))); 

OUTFI  LE  ; - R I GHTCONS  (OUTFI  LE . CONS (()JRST,  CONS (0 . CONS (L2 , QNI L) ) ) ) ; 
OUTFILE  RIGHTC0NS(0UTFILE,L1) ; 

OUIF I LE  : - R I GHTCONS  (OUTFI  LE . CONS  (^MOVEI . CONS  ( 1 . CONS  (0 , QNl  L)  ) ) > ; 
OinFILE  RIGHTC0NS(0UTFILE.L2): 

END 

ELSE  IF  CAR (EXP)  - QCOND 
THEN  BEGIN 

GENSYM(L5); 

COMCOND (CDR( EXP)  .M.L5, LOCTABLE. OUTFILE) ; 

END 

EISE  IF  CAR (EXP)  = QQUOTE 

THEN  OUTFILE  R I GHTCONS  (OUTFILE, 

CONS(QMOVEI . CONS( I . CONS (EXP. QNl L) ) ) ) 

ELSE  IF  ATOM (CAR (EXP)) 

THEN  BEGIN 

COMPLIS(CDR(EXP).M, LOCTABLE. OUTFILE); 

LOADAC(  I -LENGTH (CDR (EXP) ) . I .OUTFILE) ; 

OUTFILE  R I GHTCONS (OUTF I LE.RPOP( LENGTH (CDR ( EXP)))) ; 

OUTFILE 

R I GHTCONS (OUTFILE, 

CONS((y:ALL. 

CONS(LENCTH(CDR(EXP)) , 

CONS (CONS (QE, CONS (CAR (EXP) .QNIL)) . 

QNIL)))); 

END 

ELSE  IF  CAAR(EXP)  - QLAMBDA 
THEN  BEGIN 

COMPLIS(CDR(EXP)  .M. LOCTABLE. OUTFILE) ; 

COMPEXP(CADDAR(EXP) ,M-LENCTH(CDR(EXP)) . 

ADDIDS(LOCTABLE.CADAR(EXP) . I-M) .OUTFILE) ; 

OUIFILE  :=  RIGIITCONS(OUTFILE.RPOP(LENCTH(CDR(EXP)))); 

END; 

END; 
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PROCmURE  COMPLIS  (U  : LIST;  M : INTEGER;  LOCTABLE  : LIST; 

VAR  OUTFILE  : FILE); 

ENTRY  ISLJSTOFEXP(U); 

EXIT  OUTFILE  - < ? OUIFILE* 

? FCOMPLIS(U.M. LOCTABLE) 

> 

i NUL1.(U)  -»  OUTFILE  - OUTFILE’ 

A NOT  NULL(U)  -*  OUTFILE  - < ! OOTFILE’ 

! FCOMPEXP(CAR(U),  M,  LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPLIS(CDR(U),  M-1,  LOCTABLE) 

> ; 

REG  IN 

IF  NOT  NUIX(U) 

THEN  BEGIN 

COMPEXP(CAR(U) .M. LOCTABLE. OUTFILE) ; 

OUTFILE  R I GHTCONS (OUTFILE, CONS (QPUSH. CONS (QP. CONS (I, QNI L))))  ; 

COMPLIS(CDR(U) , M- 1 . LOCTABLE, OUTFILE) ; 

END 

END; 

PROCEDURE  LOADAC  (N.  K ; INTEGER;  VAR  OUTFILE  : FILE); 

ENTRY  TRUE; 

EXIT  OUIFILE  = < ! OUTFILE’ 

! FLOADAC(N,K) 

> 

i N > 0 -♦  OUTFILE  - OUTFILE’ 

A N < 0 -♦  OUTFILE  » < ! OUTFILE’ 

< ’MOVE  K N ’P  > 

! FLOADAC(N+I,  K+l) 

> ; 

BEGIN 
IF  N l.E  0 
THEN  BEGIN 
OUTFILE 

R I GHTCONS  (OUI F I LE , CONS ((^MOVE , CONS (K , CONS(N , CONS (QP , QNI L) ) ) ) ) ; 

I OADAC (N+ 1 . K4 I . OUTFI LE) ; 

END 

END; 


103 


rROC.F.nURF.  COMCOND  (U  . I.IST;  M : INTEGER;  L,  LOCTABLE  ; LIST; 

VAR  OUTFILE  : FILE); 

ENTRY  ISCONDLIST(U) ; 

EXIT  OUIFII.E  - < * OUTFILE’ 

• FCOMCOND(U.M.L. LOCTABLE) 

> 

A NUl.l.(U)  ^ OUIFILE  - < ! OWFILE* 

L 

> 

A NOT  NULL(U)  -»  OUTFILE  - < ! OUTFILE’ 

• FCOMPEXP(CAAR(U).  M.  LOCTABLE) 

< ’JUMPE  I L3  > 

! FCOMPEXP(CADAR(U),  M.  LOCTABLE) 

< ’JRST  L > 

L3 

? FCOMCOND(CDR(U).  M,  L,  LOCTABLE) 

> ; 

VAR  L3;L1ST; 

BEGIN 

IF  NUll  (U)  THEN  OUTFILE  RIGHTCONS(Oin'FILE,L) 

ELSE  BEGIN 

GENSYM(L3) ; 

C.OMPEXP (CAAR (U) . M . LOCTABLE . OUTFILE) ; 

OUTFILE 

R I cm  CONS  (OUTF I LE . CONS  (QJUMPE . CONS  ( I . CONS  (L3 . QNl  L) ) ) ) ; 
(;OMPEXP(CADAR(U)  .M. LOCTABLE. OUTFILE) ; 

OUTFILE  RICHTCONS(OUTFILE.CONS(qjRST.CONS(L.QNIL))) ; 

OUTFILE  RIGHTC0NS(0UTFILE.L3) ; 

COMCOND(CDR(U)  .M.L.LOCTABLE.OUTFILE) ; 

END 

END; 

PROCEDURE  COMBOOL  (P  : LIST;  M : INTEGER;  L : LIST; 

LOCTABLE  : LIST;  VAR  OUTFILE  : FILE); 

ENTRY  ISEXPRESSION(P); 

EXIT  ISAND<P)  A NULL(CDR(P))  OUTFILE  - OUTFILE’ 

A I SAND (P)  A NOT  NULL(CDR(P)) 

-♦  OUTFILE  - < ? OUTFILE’ 

? FCOMPEXP(CAR(CDR(P)).  M.  LOCTABLE) 

< ’JUMPE  I L > 

! FCOMPANDOR(CDR(CDR(P)).  M.  L,  FALSE,  LOCTABLE) 

> 

A I SAND (P) 

-*  OUIFILE  < ? OUTFILE’ 

! FCOMPANDOR(CDR(P).  M,  L.  FALSE.  LOCTABLE) 


> 
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* ISOR(P)  A NULL(CDR(P))  OOTFILE  - < ! OUTFILE’ 

< 'JRST  0 L > 

L4 

> 

& ISOR(P)  A NOT  NULL(CDR(P)) 

OUIKILE  = < ! OUTFILE’ 

! FCOMPEXP(CAR(CDR(P)),  M,  LOCTABLE) 

< ’JUMPN  I lA  > 

! FCOMPANDOR(CDR(CDR(P)),  M,  L4.  TRUE,  LOCTABLE) 

< ’JRST  0 L > 

14 

> 

* ISOR(P) 

. OUTFILE  - < ! OUTFILE’ 

• FCOMPANDOR(CDR(P).  M.  L4,  TRUE,  LOCTABLE) 

< ’JRST  0 L > 

L4 

> 

tr  ISNOT(P)  OUTFILE  - < ! OUTFILE’ 

! FCOMPEXP(CADR(P),  M,  LOCTABLE) 

< ’JUMPN  I L > 

> ; 

VAR  L4;LIST; 

BEGIN 

IF  CAR(P)  « QAND 

THEN  COMPANDOR(CnR(P)  .M.L,FALSE,L0CTABLE,0UTF1LE) 

ELSE 

IF  CAR(P)  - QOR 
THEN  BEGIN 

GENSYM(L4); 

COMPANDOR  (CDR  (P) . M . L4 , TRUE . LOCTABLE. OUTF I LE) ; 

OUTFILE  - RICllTCONS(OUrFILE.CONS(QJRST.CONS(O.CONS(L,qNIL)))); 
OUIFILE  RlCHTC0NS(0UrFILE.L4); 

END 

EISE  IF  CAR(P)-QNOT 
THEN  BEGIN 

COMPEXP(CADR(P) .M.LOCTABLE.OUTFILE) ; 

OUTFILE  := 

R IGHTCONS(OUTFILE.CONS(QJUMPN.CONS( I .CONS(L.QNIL) ) ) ) ; 

END 


END; 


PROCFDURE  COMPANDOR  (U  : LIST;  M : INTEGER;  L : LIST;  FLC  : BOOLEAN; 

LOCTABLE  : LIST;  VAR  OIH'FILE  : FILE); 

ENTRY  ISLISTOFEXP(U); 

EXIT  OUTFILE  - < ! OUTFILE’ 

! FCOMPANDOR(U.M.L, FLC, LOCTABLE) 

> 

Sr  NULL(U)  ->  OUTFILE  - OUTFILE’ 

A NOT  NULL(U)  a FLC 
-*  OUTFILE  - < ! OUTFILE’ 

! FCOMPEXP(CAR(U),  M,  LOCTABLE) 

< ’JUMPN  I L > 

! FCOMPANDOR(CDR(U),  M.  L.  FLC,  LOCTABLE) 

> 

Sr  NOT  NULL(U)  a NOT  FLC 
-♦  OUTFILE  - < ! OUTFILE’ 

! FCOMPEXP(CAR(U),  M,  LOCTABLE) 

< ’JUMPE  I L > 

! FCOMPANDOR(CDR(U),  M,  L,  FLC,  LOCTABLE) 

> ; 

BEGIN 

IF  NOT  NULL(U) 

THEN  IF  FLC 

THEN  BEGIN 

COMPEXP(CAR(U) ,M, LOCTABLE, OUTFILE) ; 

OUTFILE 

R I CHTCONS (OUTF I LE , CONS (QJUMPN , CONS ( I , CONS ( L , QNI L) ) ) ) ; 

COMP ANDOR (CDR ( U)  ,M,L, FLC, LOCTABLE, OUTFILE) ; 

END 

EI^E  BEGIN 

COMPEXP(CAR(U) ,M, LOCTABLE, OUTFILE) ; 

OUI  FILE 

R I CHTCONS (OUTF I LE , CONS (QJUMPE , CONS ( I , CONS (L , QNI L) ) ) ) ; 
COMPANDOR(CDR(U)  .M,L, FLC, LOCTABLE, OUTFILE) ; 

END 

END; 

FUNCTION  RPOPd  : INTEGER)  : LIST; 

ENTRY  TRUE; 

EXIT  RPOPd ) - < ’SUB  ’P  < ’C  0 0 I I > > ; 

BEGIN 

RPOP  CONS(QSUB, 

CONS(QP, 

CONS  (CONS  ((^ , CONS  (0 , CONS  (0 , CONS  (I . CONS  (I . QNI L)  ))))  . 
QNIL))): 


END; 
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FUNCTION  AnniDSdOCTABLE,  IDLIST  : LIST;  LASTLOC  : INTEGER)  : LIST; 
ENTRY  TRUE; 

EXIT  AnniDSdOCTABLE,  IDLIST.  LASTLOC) 

^ < ! PRUP( IDLIST,  LASTLOC) 

! I.OCTABLE  > ; 

BEGIN 

ADD  I DS : - APPEND  ( PRUP  ( I DL I ST . LASTLOC) , LOCTABLE) ; 

END; 


BEGIN  END 


A.7  Intrinsic  Lisp-like  Routines 


l ollowing  is  an  alphabetic  list  of  the  Lisp-like  functions  (and  one  procedure)  which 
were  treated  as  intrinsic  to  the  source  language  In  the  sense  that  their  code  was  not  supplied. 
T heir  properties  were  given  to  the  theorem  prover  as  rewrite  rules.  The  types  of  the  functions 
and  their  arguments  are  given  in  Pascal  format. 

FUNCTION  APPEND  (XI,  X2  : LIST) : LIST; 

FUNCTION  ASSOC  (X,  Y : LIST) : LIST; 

FUNCT  ION  AT  OM  (X  : LIST) : BOOLEAN; 

1 UNCI  ION  CA  AR  (X  ; LIST) : LIST; 

FUNCTION  CADAR  (X  : LIST) : LIST; 

FUNCTION  CADDAR  (X  : LIST) : LIST; 

FUNCT  ION  CADDDR  (X  : LIST) : LIST; 

FUNCT  ION  CADDR  (X  : LIST) : LIST; 

FUNCT  ION  CADR  (X  : LIST) : LIST; 

FUNCTION  CAR  (X  : LIST) : LIST; 

FUNCTION  CDR  (X  : LIST) : LIST; 

FUNCTION  CONS  (X,  Y : LIST) : LIST; 

PROCEDURE  GENSYM  (VAR  X : LIST) ; 

EUNCT  ION  LENGTH  (X  : LIST) : INTEGER; 

FUNCTION  NULL  (X  : LIST) : BOOLEAN; 

FUNCT  ION  NUMBER?  (X  : LIST) : BOOLEAN; 

FUNCT  ION  RIGHTCONS  (OUTFILE  : FILE;  STUFF  : LIST)  : FILE; 


A.8  Part  Two  Proofs 

T he  following  subsections  contain  the  part  two  proofs  for  compiler  MCO. 


107 


I 


A.8.1  ISNIL  Ca&c 

Wc  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where  ISNIL(S): 
ISEXPRESSION(S)  -» 

RI 

|v  Nil.  V NIL  V NIL 

( Pre(S)  (w  0 A (Post(S)  w 0 -»  VR2, . . . , RN(S)  (T  S w 0 )) 

{ICOMPEXP(S.M.LOCTABLE)}  T > 

A s t a c kok ( ECOMPEXP (S , M , LOCTABLE) ) 

where  I OCT  A BEE  is  of  form  < < NAMEI  . LOCI  > ...  < NAMEr  . LOCr  > >. 
V - < NAMF.I  ...  NAMEr  >.  w - < miM+P+LOCI] ...  m[M+P+LOCr]  >,  and  N(S)  is  a 
function  pivinf;  the  maximum  number  of  registers  that  are  modified  during  execution  of  the 
compilation  of  S. 

T o prove  the  Hoare  rule  portion  we  will  apply  Hoare  rules  to  T for  the  statements  of 
FCOMPFXP  for  the  particular  case  in  question  to  form  a verification  condition.  We  will 
a|)ply  simplifications  to  the  verification  condition  during  and  after  generation  to  reduce  it  to 
1 RUF 

The  code  produced  by  the  compiler  for  this  case  is: 

I tOMPEXP(S,M.  LOCTABLE)  - < < ’MOVEl  1 0 > > 

Using  the  MOVEl  Hoare  rule: 

Ri 

q y { < ’MOVEl  i y > } q 

gives  us 

RI 
T 0 

1'hc  verification  condition  is  then: 

RI 

V NIL  V NIL  v NIL 

Pre(S)  w 0 A (Post(S)  w 0 VR2 RN(S)  (T  S w 0 )) 

RI 

-♦TO 

We  now  expand  Pre(S),  Post(S).  N(S),  and  S by  the  formulas  which  apply  for  the  case 
of  NIL.  This  results  In: 
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Rl 


V 

NIL 

V 

NIL 

V 

NIL 

TRUE. 

w 

0 A (TRUE 

w 

0 -»  T 

NIL 

w 

0 ) -»  T 

Since  'I  KUE  and  NIL  represent  constants  distinct  from  the  source  language  variables  in  v,  we 
may  drop  all  v substitutions  by  subrule  i.  Similarly  we  may  drop  the  NIL  substitutions  on 
I'RUE.  Then  carry  out  the  NIL  substitution  on  NIL,  and  simplify  logically.  The  result  is: 


T 


Rl  Rl 
0 T 0 


which  is  obviously  T RUE. 

To  prove  the  stackok  term  of  the  statement  of  correctness  we  simply  apply  S6.  This 
completes  the  proof  of  the  compiler  for  the  ISNIL  case. 

A.8.2A  1ST  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where  IST(S). 

T o prove  the  Hoare  rule  portion  we  will  again  apply  Hoare  rules  to  the  assertion  T for 
the  statements  of  FCOMPEXP  for  the  particular  case  in  question  to  form  a verification 
condition.  We  will  apply  simplifications  to  the  verification  condition  during  and  after 
generation  to  reduce  it  to  TRUE. 

1 he  code  produced  by  the  compiler  for  this  case  is: 


FC0MPEXP(S.M,L0CTABLE)  - < < ’MOVEI  1 <’QU0TE  ’T>  > > 
Using  the  MOVEI  Hoare  rule; 

Ri 


y { < 'MOVEI  i y > } Q 


gives  us 


Rl 

< 'QUOTE  'T> 


1 he  verification  condition  is  then; 


Pre(S) 


NIL 

0 A (Post(S) 


NIL 

0 -*  VR2 RN(S)  (T 


Rl 


NIL 
0 )) 


Rl 

< 'QUOTE  ’T> 


V** 
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VVc  now  pxpand  Prc(S),  Post(S),  N(S),  and  S by  the  formulas  which  apply  for  the  case 

of  1 Wc  may  use  <’<^UOTE  ’T>  for  S in  place  of  the  constant  T,  since  the  two  forms  mean 

tlic  same  in  both  the  source  and  target  language.  We  choose  the  longer  form  here  to  avoid 
confusion  with  the  variable  T used  in  the  statement  of  correctness. 

1 his  rr.sults  in: 

Rl 

V INII.  V NIL  V NIL 

TRUK  w lo  A (IRUF,  w 0 -♦  T < ’QUOTE  ’T>  w 0 ) -» 

Rl 

T QUOTE  ’T> 

In  a manner  similar  to  the  proof  of  the  NIL  case,  we  simplify  the  verification  condition  to; 

Rl  Rl 

T < ’QUOTE  ’!>  * T <’ QUOTE  ’T> 

which  is  obviously  T RUE. 

I o prove  the  stackok  term  of  the  statement  of  correctness  we  simply  apply  S6.  This 
completes  the  proof  of  the  compiler  for  the  1ST  case. 

A.8./?ll  ISNIIMHFR  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISNl)MRF.R(S). 

We  will  apply  lloare  rules  to  the  assertion  T for  the  statements  of  FCOMPEXP  for  the 
particular  case  in  question  to  form  a verification  condition.  We  will  apply  simplifications  to 
the  verification  condition  during  and  after  generation  to  reduce  it  to  TRUE. 

1 he  code  produced  by  the  compiler  for  this  case  is: 

FCOMPEXP (S.M.LOCTABLE)  - < < ’MOVE I I <’ QUOTE  S>  > > 

Using  the  MOVEI  Floare  rule: 

Ri 

Q y { < ’MOVEI  i y > } Q 
gives  us 
Rl 

T < ’QUOTE  S> 


The  verification  condition  is  then: 


no 


Rl 

V NIL  V NIL  V NIL 

Pre(S)  w 0 A (Post(S)  w 0 -»  VR2 RN(S)  (T  S w 0 )) 

RI 

-*  T < 'QUOTE  S> 

We  now  expand  Pre<S),  Post<S),  and  N(S)  by  the  formulas  which  apply  for  the 
ISNUMBER  case.  This  results  in: 

RI 

V NIL  V NIL  V NIL 

TRUE  w 0 A (TRUE  wO  -*TS  w 0 )-* 

RI 

1 <• QUOTE  S> 

In  a manner  similar  to  the  proof  of  the  NIL  case,  we  simplify  the  verification  condition  to: 

RI  RI 

T S ->  T < 'QUOTE  S> 

For  the  case  S being  a constant,  S and  <’Q^UOTE  S>  mean  the  same  thing,  and  thus  the 
verification  condition  is  TRUE. 

To  prove  the  stackok  term  of  the  statement  of  correctness  we  simply  apply  S6.  This 
completes  the  proof  of  the  compiler  for  the  ISNUMBER  case. 


A.8.3  ISIDENTIFIER  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISIDENTIFIER(S). 

We  will  apply  Hoare  rules  to  the  assertion  T for  the  statements  of  FCOMPEXP  for  the 
particular  case  in  question  to  form  a verification  condition.  We  will  apply  simplifications  to 
the  verification  condition  during  and  after  generation  to  reduce  it  to  TRUE. 

The  code  produced  by  the  compiler  for  this  case  is: 

FCOMPEXP(S.M,LOCTABLE)  - 

< < 'MOVE  I RETRIEVE(S,M,LOCTABLE,OUrFILE’)  ’P  > > 

We  can  sec  that  whenever  variables  are  declared  (function  definition  or  lambda  cases), 
the  names  of  the  new  variables  are  added  to  LOCTABLE.  In  this  case  S is  a variable,  and 
thus  will  be  in  LOCTABLE.  Further,  the  most  recent  declaration  of  a variable  name  will  be 
the  leftmost  occurrence  in  LOCTABLE,  since  names  are  always  added  on  the  left.  Thus 
LOCTABLE  will  be  of  form 

< <NAMEI  . LOCl>  ...  <NAMEI  . LOCi>  ...  <NAMEr  . LOCr>  > where  NAME!  is  the  first 
NAMEj  such  that  S - NAMEJ.  Now  RETRIEVE(S,M,LOCTABLE,OUTFlLE’)  is  (by 


Ill 


examination  of  its  code)  M+CDR<ASSOC(S,LOCTABLE)).  This  is  (by  executing  the  Lisp 
functions  CDR  and  ASSOC)  M+LOCi. 

Using  the  MOVE  Uoare  rule: 


Q 


Ri 

m(P4j]  { < ’MOVE  I J ’P  > } Q 


gives  us 


T 


RI 

m[P4Mil.0Ci) 


I'lie  complete  verification  condition  is  then: 


V 

NIL 

V 

NIL 

Pre(S) 

w 

0 A (Post(S) 

w 

0 

RI 


VR2 RN(S)  (T 


S 


V NIL 
w 0 )) 


^ T 


RI 

m[P4M+L0Ci) 


We  now  expand  Pre(S),  Post(S),  and  N(S)  by  the  formulas  which  apply  for  the 
iSIDENTIFIER  case.  This  results  in: 


TRUE 


V 

NIL 

V 

NIL 

V 

NIL 

w 

0 A (TRUE 

w 

0 -♦  T 

S 

w 

0 ) -*  T 

RI 


RI 

mCP-fM+LOCi] 


In  a manner  similar  to  the  proof  of  the  NIL  case,  we  simplify  the  verification  condition  to: 
RI 
1 S 


NIL 

0 ->  T 


RI 

m[P+M+LOCi] 


Now  V - < NAMEI  ...  NAME! ...  NAMEr  > and  w 

< m[M4P+LOCI)...  m(M+P+LOCi]...  m(M+P+LOCr]>  where  NAMEI  is  the  first  such 
NAMEj  such  that  S •=  NAMEj.  Thus 


w ir  m[M4P+l,0Ci] 


We  see  NIL  -c  mlM4P+LOCl),  and  thus  by  subrule  4,  we  get: 


RI  RI 

in[M4P4L0Ci]  -4  T mlP-rM+LOCil 


which  is  obviously  TRUE. 


M2 


To  prove  the  stackok  term  of  the  statement  of  correctness  we  simply  apply  S5.  This 
completes  the  proof  of  the  compiler  for  the  ISIDENTIFIER  case. 

A.8.4  ISAND  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
JSAND(S).  We  will  prove  it  in  two  subcases:  that  of  no  arguments,  and  that  of  one  or  more. 
The  code  produced  by  the  compiler  for  the  first  subcase  is: 

FCOMPEXP(S.M.LOCTABLE)  - < < ’MOVEI  1 <’ QUOTE  ’T>  > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 


The  assertion  1'  is  to  have  Hoare  rules  applied  for  these  statements.  First  we  note  that 
as.scrtion(L2)  = T.  Then  we  apply  the  MOVEI  Hoare  rule: 

Ri 

Q y { < ’MOVEI  i y > 1 Q 

to  obtain 

RI 
T 0 

Then  note  that 

RI 

assert  ion(LI ) T 0 

Next  we  apply  the  JRST  Hoare  rule: 

assertlon(l)  { < ’JRST  0 i > ) Q 
to  obtain  T.  Now  apply  the  MOVEI  Hoare  rule,  resulting  in 
RI 

T < ’QUOTE  ’T> 

Completing  the  verification  condition  generation  we  get: 
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Rl 

V NIL  V NIL  V NIL 

Pre(S)  w 0 A (Post(S)  w 0 VR2 RN(S)  (T  S w 0 )) 

Rl 

-♦  T < ’QUOTE  ’T> 

Wc  now  expand  Prc(S),  Post(S).  and  N(S)  by  the  formulas  which  apply  for  this  subcase, 
and  we  let  S <’C^UOTE  T>.  Note  that  the  value  for  the  AND  with  no  argumenu  1$  the 
Lisp  equivalent  of  I'RUE.  This  results  in: 

I V NIL 
TRUE  I w 0 

Rl 

V NIL  V NIL 

A ((< ’QUOTE  •T>-< ’QUOTE  ’T>)  w 0 -♦  T < ’QUOTE  ’T>  w 0 ) 

Rl 

-»  T <’ QUOTE  ’T> 

In  a manner  similar  to  the  proof  of  the  NIL  case,  wt-  simplify  the  verification  condition  to: 

Rl  Rl 

T < ’QUOTE  ’T>  -»  T < ’QUOTE  •!> 

which  is  obviously  I'RUE. 

I'he  code  produced  by  the  compiler  for  the  second  subcase  is; 

FCOMPEXP(S,M,LOCTABLE)  - < ! FCOMPEXP(bI  .M.LOCTABLE) 

< ’JUMPS  I LI  > 

» FCOMPEXP(<*AND  b2  ...  bn>. M.LOCTABLE) 


where  S Is  of  form  < ’AND  bl  b2  ...  bn  >. 

Assuming  the  FCOMPEXP  properties  on  < ’AND  b2 ...  bn  >,  a smaller  portion  of 
source  code  (the  inductive  assumption),  we  get: 

Rl 

V NIL  v NIL  V NIL 

Pre(s)  w 0 A (Post(s)  w 0 -♦  VR2 RN(s)  (T  $ w 0 )) 

where  s » < 'AND  b2  ...  bn  >.  Next  we  wish  to  apply  the  JUMPE  Hoare  rule: 


(Rl*-0  -♦  assertion(l))  a (Rii«0  -*  Q)  { < 'JUMPE  I I > } Q 


IH 


but  wc  don't  have  asset tion(LI)  because  LI  lies  inside  FCOMPEXP  of  s.  From  the  assertions 
proved  in  the  part  one  proof  we  have; 

FCOMPEXP(S.M,LOCTABLE)  « < ... 

LI 

< 'MOVEI  I 0 > 

L2 

> 

1 hus  assertion(LI)  is  always  the  same  as  in  the  subcase  of  n*=0.  The  verification  condition 
thus  becomes; 

Rl 

(Rl=0  -.7  0 ) A (RUO  -♦ 

Rl 

V NIL  V NIL  V NIL 

Pre(s)  w 0 A (Post(s)  w 0 -♦  VR2, . . . ,RN(s)  (T  s w 0 ))) 

We  apain  assume  the  FCOMPEXP  properties  on  a smaller  portion  of  source  code,  this 
time  on  bl.  The  result  is; 

V NIL  V NIL 

Pre(bl)  w 0 A(Post(bl)  w 0 -»  VR2 RN(bl) 

Rl 

(((RI-0  -♦  T 0 ) A (RI^O 

V NIL 

Pre(s)  w 0 A (Posr(s) 

Rl 

V NIL 

bl  w 0 )) 

Since  this  formula  is  getting  rather  large,  we  Introduce  a new  notation.  The 
substitutions 

V NIL 
w 0 

will  be  abbreviated  by  *.  The  form 

IX 

y ♦ 


Rl 

V NIL  llv  INIL 

w 0 VR2 RN(s)  (T  s | |w  [o  )))) 
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will  be  understood  to  mean 

X 

I V NIL 
y I w 0 

I'he  formula  using  this  notation  Is  then: 

R1 

Pre(bl)  * A (Post(bl)  * -+  VR2 RN(bl)  (((Rl»0  -♦  T 0 ) a 

Rl  Rl 

(Rl.'O  Pre(s)  * a (Post(s)  ♦ -*  VR2. . . . ,RN(s)  (T  s » ))))  bl  » )) 

Distribute  the  last  Rl  substitution  as  far  as  possible.  Then  apply  subrule  12  and 
subrule  4 to  drop  the  bl  substitution  on  the  first  T term.  Do  the  bl  substitution  on  Rl-0  and 
R l-'O.  R I -'C  Pre(s),  Post(s),  bl,  s,  w,  and  0,  so  by  subrule  S we  get  Rl  "C  Pre(s)  #,  Post(s)  *, 
bl  ♦,  and  s *.  So  drop  the  Rl  substitution  on  Pre(s)  * and  Post(s)  ♦ by  subrule  4.  Apply 
subrule  18b  to  move  the  outer  Rl  substitution  inside  VR2,...,RN(s).  Then  drop  the  outer  Rl 
substitution  on  the  last  T term  by  subrule  12  and  subrule  4.  The  result  is: 

Rl 

Pre(bl)  ♦ A (Post(bl)  * VR2 RN(bl)  ((bl  * -0  -♦  T 0 ) a 

Rl 

(bl  » -0  -4  Pre(s)  ♦ A (Post(s)  ♦ -*  VR2 RN(s)  ( T s ♦ ))))) 

T he  same  argument  as  used  on  Rl  above  established  for  all  1 Ri  -’C  bl  »,  Pre(s)  #,  and 
Post(s)  *.  T hus  we  can  apply  subrule  16  repeatedly,  then  regroup  the  AND-IMPLY  structure, 
and  finally  apply  subrule  17b  to  get: 

Rl 

Pre(bl)  * A (Post(bl)  * a bl  * «0  -*  VR2 RN(bl)  (TO)) 

A (Post(bl)  ♦ A bl  ♦ xO  -»  Pre(s)  * a (Post(s)  * 

Rl 

VR2 Rmax(N(bl),N(s))  ( T s * ))) 

Completing  the  verification  condition  generation  we  get: 
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Rl 

Prc(S)  ♦ A (Post(S)  * -+  VR2, . . . , RN(S)  ( T $»))-♦ 

Rl 

Pre(bl)  * A (Post(bl)  * a bl  * -0  VR2 RN(bl)  (TO)) 

A (Post(bl)  » A bl  * •'0  -*  Pre(s)  » a (Post(s)  * -» 

Rl 

VR2 Rmax(N(bl).N(s))  ( T s » ))) 

Wr  now  expand  Pre(S)  and  Post(S)  by  the  formulas  which  apply  for  this  subcase. 
Distributing  the  substitutions  we  get: 

Pre(bl)  * A (bl  * •«  0 -t  Pre(s)  *)  a 

((bl  «>:0->S«  = 0)a  (bl  *^0-«S«cs«)a  Post(bl)  * A 

Rl 

(bl  * 0 -*  Post(s)  * ) -♦  VR2 RN(S)  ( T S * )) 

Rl 

Pre(bl)  * A (Post(bl)  * a bl  * .0  -»  VR2 RN(bl)  (TO)) 

A (Post(bl)  * A bl  * xO  -♦  Pre(s)  » a (Post(s)  * -* 

Rl 

VR2 Rmax(N(bl).N(s))  ( T s ♦ ))) 

Because  Pre(bl)«  appears  in  the  hypotheses,  we  may  set  it  to  TRUE  and  simplify 
logically.  I'he  Pre(s)  * term  is  implied  by  a hypothesis  and  may  be  similarly  treated.  The 
result  is: 

Prc(bl)  # A (bl  * K 0 -♦  Pre(s)  *)  a 

((bl  «>0-*S«>^0)a  (bl  *xO-»S*-s*)a  Post(bl)  « A 

Rl 

(bl  ♦ e 0 ^ Post(s)  * ) ^ VR2 RN(S)  ( T S * )) 


Rl 

(l’ost(bl)  * A bl  * = 0 VR2 RN(bl)  ( T 0 ))  a 

(rost(bl)  ♦ A bl  * 0 A Post(s)  * VR2 Rmax(N(bI)  ,N(s)) 
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Rl 

( T s * )) 

We  will  break  this  into  two  cases;  that  of  bl  «=  NIL,  and  that  of  bl  ^ NIL. 

Case  I:  bl  = NIL.  Then  in  the  target  language,  bl  * = 0.  We  expand  N(S)  by  the 
formula  which  holds  for  this  subcase.  Substituting  TRUE  or  FALSE  according  to  the  case  1 
information,  then  logically  simplifying,  the  verification  condition  becomes; 

Rl 

Prc(bl)  ♦ A (S  ♦ - 0 A Post(bl)  * -♦  VR2 RN(bl)  ( T S * ))  -» 

Rl 

(Post(bl)  * VR2 RN(bl)  (TO)) 

which  is  obviously  1 RUE. 

Case  2;  bl  K NIL.  I'hen  bl  ♦ 0.  In  a manner  similar  to  case  I we  get; 

Pre(bl)  ♦ A Pre(S)  *a(S*«s*a  Post(bl)  * a Post(s)  * 

Rl 

VR2 Rmax(N(bl).N(s))  ( T S * )) 

Rl 

(Post(bl)  * A Post(s)  * -*  VR2 Rmax(N(bl),N(s))  ( T s * )) 

which  is  obviously  T RUE. 

We  now  must  prove  the  stackok  term  for  this  case.  This  we  will  also  do  by  the  two 
subcases.  For  the  no  argument  subcase  we  apply  S3  to  obtain  the  subgoals 
stackok(<  < MOVEI  1 <’qUOTE  ’T>  > >)  and 

stackok(<  < ’JRSl  0 L2  > LI  < 'MOVEI  1 0 > L2  >).  The  first  is  disposed  of  by  applying  S6. 
For  the  second  subgoal  we  use  S8. 

For  the  n argument  subcase,  we  will  first  apply  S3  to  obtain  the  subgoals 
slackok(FCOMPEXr(bl,M,LOCTABLE))  and 

.sfackok(<  <’JUMPE  1 LI  > ! FCOMPEXP(s,M.LOCTABLE)  >).  The  first  Is  part  of  the 
inductive  assumption  that  these  properties  are  true  of  smaller  parts.  The  second  subgoal 
requires  S9.  This  rule  may  be  paraphrased  as;  If  the  stack  is  preserved  both  where  you  may 
jump  to  and  where  you  fall  through,  then  it  is  preserved  by  the  conditional  Jump  structure  as 
a whole. 

T he  two  subgoals  thus  created  are  stackok(<  < 'MOVEI  I 0 > L2  >)  and 
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stackok(FCOMPEXP(s,M,LOCTABLE)).  The  first  was  derived  by  recalling  that  we 
previously  proved  that  FCOMPEXP(s,M,LOCTABLE)  always  ended  in  LI, 
< ’MOVFI  ( 0 >,  and  L.2.  We  may  prove  this  subgoal  by  applying  S3  to  get  the  subgoals 
stackok(<  < ’MOV El  1 0 > >)  and  stackok(<  L2  >).  The  first  of  these  is  proved  by  S6  and  the 
second  by  S7. 

The  remaining  siibgoal  of  stackok(FCOMPEXP(s,M,LOCTABLE))  is  proved  by 
appealing  to  the  inductive  assumption.  ^ 

1 his  completes  the  proof  of  the  compiler  for  the  case  of  AND. 


A.8.5  ISOR  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where  ISOR(S). 
The  target  code  and  proof  of  this  case  closely  parallel  the  ISAND  case.  As  with  AND,  we  will 
ptove  the  case  in  two  subcases;  that  of  no  arguments,  and  that  of  one  or  more. 

1 he  code  produced  by  the  compiler  for  the  first  subcase  is: 

FCOMPEXP(S,M,l.OCTABLE)  « < < ’JRST  0 LI  > 

L4 

< 'MOVEI  I < ’QUOTE  ’T>  > 

< ’JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 


1 he  assertion  1'  is  to  have  Floare  rules  applied  for  these  statements.  First  we  note  that 
asscrtion(L2)  « T.  Then  we  apply  the  MOVEI  Hoare  rule: 

Ri 

Q y { < ’MOVEI  i y > } Q 

to  obtain 

RI 
T 0 

Then  note  that 

RI 

assert ion(LI)  -TO 

Next  we  apply  the  JRST  Hoare  rule: 

assertion(l)  { < ’JRST  0 I > } Q 
to  obtain  1'.  Now  apply  the  MOVEI  Hoare  rule,  resulting  in 
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Rl 

T < 'QUOTE  ’T> 

Now  we  see  that 

Rl 

assert  ion(M)  ••  T < 'QUOTE  ’T> 

Then  applying  the  JRST  Hoare  rule  we  get  assertion(LI),  which  Is 
Rl 

1 0 

Completing  the  verification  condition  generation  we  get; 

Rl 

V NIL  V NIL  V NIL 

Pre(S)  w 0 A (Post(S)  w 0 -♦  VR2 RN(S)  (T  S w 0 )) 

Rl 

-*  T 0 

We  now  expand  Pre{S),  Post(S),  and  N(S)  by  the  formulas  which  apply  for  this  subcase, 
and  wc  let  S = NIL.  Note  that  the  value  for  the  OR  with  no  arguments  is  the  Lisp  equivalent 
of  FALSE.  This  results  in; 

Rl 

I V INIL  V NIL  V NIL  Rl 

TRUE  I *v  |o  A ((NIL-NIL)  wO  -*TNIL  wO  )-*T0 

In  a manner  similar  to  the  proof  of  the  NIL  case,  we  simplify  the  verification  condition  to; 

Rl  Rl 

T 0 » T 0 

which  is  obviously  'TRUE. 

The  code  produced  by  the  compiler  for  the  second  subcase  is: 

FCOMPEXP(S,M,LOCTABLE)  - < ! FCOMPEXP(bl .M.LOCTABLE) 

< 'JUMPN  I L4  > 

! FCOMPEXP(<’OR  b2  ...  bn>, M.LOCTABLE) 


where  S is  of  form  < 'OR  bl  b2  ...  bn  >. 

Assuming  the  FCOMPEXP  properties  on  < 'OR  b2  ...  bn  >,  a smaller  portion  of  source 
ccxie  (the  inductive  assumpMon),  we  get: 
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R1 

V NIL  V NIL  V NIL 

Pic(s)  w 0 A (Post(s)  w 0 -♦  VR2 RN(s)  (T  s w 0 )) 

where  s - < 'OR  b2  ...  bn  >.  Next  we  wish  to  apply  the  JUMPN  Hoare  rule: 

(Rij'O  -»  assert  ion(  I ) ) a (Ri«=0  -»  { < 'JUMPN  i I > } Q 

but  wc  don’t  have  assertion(L^)  because  L4  lies  inside  FCOMPEXP  of  s.  From  the  assertions 
proved  in  the  part  one  proof  we  have: 

FCOMPEXP (S.M.LOCT ABLE)  « < ... 

L4 

< 'MOVEI  I < 'QUOTE  'T>  > 

< 'JRST  0 L2  > 

LI 

< 'MOVEI  I 0 > 

L2 


I'hus  assertion(L4)  is  always  the  same  as  in  the  subcase  of  n>0.  The  verification  condition 
thus  becomes: 

Rl 

(RUO  T <'QU0TE  'T>  ) a (RI-0 

Rl 

V NIL  V NIL  V NIL 

Prc(s)  w 0 A (Post(s)  w O -♦  VR2, . . . , RN($)  (T  s w O ))) 

We  again  assume  the  FCOMPEXP  properties  on  a smaller  portion  of  source  code,  this 
time  on  bl.  'I  he  result  is: 

V NIL  V NIL 

Pre(bl)  w 0 A(Post(bl)  w 0 -*  VR2 RN(bl) 

Rl 

(((RI.'O  -*  T <'QU0TE  'T>  ) a (RI-O  -* 

Rl 

V NIL  V NIL  V NIL 

Pre(s)  w 0 A (Post($)  w 0 VR2 RN(s)  (T  $ w 0 )))) 


Rl 
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Umtir  * for  the  substitutions 


we  pet; 

R1 

Pre(bl)  » A (Post(bl)  * VR2 RN(bl)  (((Rl^O  -»  T <’QUOTE  ’T>  ) a 

R1  R1 

(RUO  Pre(s)  ♦ a (Post(s)  * -*  VR2 RN(s)  (T  s » ))))  bl  * )) 

Distribute  the  last  R1  substitution  as  far  as  possible.  Then  apply  subrule  12  and 
subrule  4 to  drop  the  bl  substitution  on  the  first  T term.  Do  the  bl  substitution  on  Rl-0  and 
R DO  R 1 -•€  Prc(s).  Posi(s).  bl,  s.  w,  and  0,  so  by  subrule  3 we  get  R1  -c  Pre(s)  ♦,  Post(s)  ♦, 
bl  ♦,  and  s *.  So  drop  the  Rl  substitution  on  Pre(s)  » and  Post(s)  ♦ by  subrule  4.  Apply 

subrule  18b  to  move  the  outer  Rl  substitution  inside  VR2 RN(s).  I'hen  drop  the  outer  Rl 

substitution  on  the  last  1 term  by  subrule  12  and  subrule  4.  The  result  is; 

Rl 

Pre(bl)  ♦ A (Post(bl)  * VR2 RN(bl)  ((bl  * ^0  ■*  T <’(iU0TE  ’T>)  a 

Rl 

(bl  * -0  -►  Pre(s)  » a (Post(s)  ♦ -»  VR2 RN($)  ( T s » ))))) 

The  same  argument  as  used  on  Rl  above  established  for  all  1 Ri  -•«  bl  »,  Pre(s)  »,  and 
Post(s)  ♦.  1 hus  we  can  apply  subrule  16  repeatedly,  then  regroup  the  AND-IMPLY  structure, 
and  finally  apply  subrule  17b  to  get; 

Rl 

Pre(bl)  * A (Post(bl)  » a bl  * •'0  -+  VR2, . . . ,RN(bl)  ( T <’(^U0TE  ’T>  )) 

A (Post(bl)  * A bl  * -^0  -♦  Pre(s)  » a (Post(s)  » -+ 

Rl 

VR2 Rmax(N(bl),N(s))  ( T s » ))) 


Completing  the  verification  condition  generation  we  get; 
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Prc(S)  * A (Post(S)  * VR2 RN(S)  ( T |s  » ))  -* 

Ri 

Prc(b))  ♦ A (Post(bl)  * A bl  » ^0  -*  VR2 RN(bl)  ( T <’QU0TE  ’T>  )> 

A (Post(bl)  ♦ A bl  * *-0  -♦  Pre(s)  ♦ a (Post(s)  » -♦ 

Rl 

VR2 Rmax(N(bl).N(s))  ( T s ♦ ))) 

Wc  now  expand  Pre(S)  and  Post(S)  by  the  formulas  which  apply  for  this  subcase. 
Distributing:  the  substitutions  we  get: 

Pre(bl)  ♦ A (bl  * «■  0 ->  Pre(s)  *)  a 

((bl  ♦ 0 S ♦ = <’QU0TE  ’T>)  a (bl  ♦■0-*S*-s*)a  Post(bl)  # a 

Rl 

(bl  ♦ - 0 -»  Post(s)  * ) -♦  VR2 RN(S)  ( T S * )) 

-> 

Rl 

Prc(bl)  ♦ A (Post(bl)  * A bl  * -0  VR2 RN(bl)  ( T <’QU0TE  ’T>  )) 

A (Post(bl)  * A bl  ♦ '0  -*  Pre(s)  * a (Post(s)  * -» 

Rl 

VR2 Rniax(N(bl)  ,N(s))  ( T s * ))) 

I’.ccause  Prc(bl)*  appears  in  the  hypotheses,  we  may  set  it  to  TRUE  and  simplify 
logically.  T he  Pre(s)  » term  is  implied  by  a hypothesis  and  may  be  similarly  treated.  The 
result  is: 

Prc(bl)  ♦ A (bl  * = 0 -*  Prc(s)  *)  a 

((bl  * wt  0 ■*  S * <’QU0TE  ’T>)  A (bl  ♦«0-*S*«s*)a  Post(bl)  * a 

Rl 

(bl  ♦ - 0 -♦  Post(s)  » ) -*  VR2 RN(S)  ( T S * )) 
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|R1 

(Post(bl)  * A bl  * 0 VR2 RN(bl)  ( T |<’QU0TE  'T>  ))  a 

(Post(bl)  * A bl  * = 0 A Post(s)  ♦ -♦  VR2, . . . , Rraax(N(bI ) ,N(s) ) 

Rl 

(1  s » )) 

Wc  will  break  this  into  two  cases;  that  of  bl  k NIL,  and  that  of  bl  - NIL. 

Case  I:  bl  NIL.  Then  in  the  target  language,  bl  » 0.  We  expand  N(S)  by  the 

formula  which  holds  for  this  subcase.  Substituting  TRUE  or  FALSE  according  to  the  case  I 
information,  then  logically  simplifying,  the  verification  condition  becomes; 

Pre(bl)  * A 

Rl 

(S  * - <’QUOTE  ’T>  A Post(bl)  * -»  VR2 RN(bl)  ( T S * ))  -♦ 

Rl 

(Post(bl)  » -♦  VR2 RN(bl)  ( T <’QU0TE  ’T>  )) 

which  is  obviously  T RUE 

Case  2;  bl  - NIL.  Then  bl  * « 0.  In  a manner  similar  to  ca.se  1 we  get; 

Pre(bl)  * A Pre(S)  »a(S*«=s*a  Post(bl)  ♦ a Post(s)  * -♦ 

Rl 

VR2 Rmax(N(bl),N(s))  ( T S * )) 

-» 

Rl 

(Post(bl)  ♦ A Post(s)  * -♦  VR2 Rmax(N(bl)  ,N(s))  ( T s * )) 

which  is  obviously  TRUE. 

We  now  must  prove  the  stackok  term  for  this  case.  This  we  will  also  do  by  the  two 
subcases.  For  the  no  argument  subcase  we  apply  S3  to  obtain  the  subgoals 
stackok(<  < 'JRST  Oil  > ...  LI  >)  and  stackok(<  < ’MOVEl  1 0 > L2  >).  The  first  is  proved 
by  applying  S8.  For  the  second  subgoal  we  use  S3  again,  then  prove  the  resulting  subgoals 
with  S6  and  S7,  respectively. 

For  the  n argument  subcase,  we  will  first  apply  S3  to  obtain  the  subgoals 
stackok(FCOMPEXP(bl,M.LOCTABLE))  and 

siackok(<  < ’JUMPN  I L-l  > ! FCOMPEXP(s,M,LOCTABLE)  >).  The  first  is  part  of  the 
inductive  assumption  that  these  properties  are  true  of  smaller  parts.  The  second  subgoal 
requires  S9. 

1'hc  two  subgoals  thus  created  are 
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stackok(<  < ’MOVEI  I < ’QUOTE  *T>  > 

< * )RSr  0 L2  > 

1. 1 

< ’MOVFI  I 0 > 

12  >) 

and  stafkok(l-COMPEXP(s,M,LOCTABLE)).  The  first  was  derived  by  recalling  that  we 
previously  proved  that  FCOMPEXP(s,M,LOCTABLE)  always  ended  in  L4  followed  by  code 
in  this  subgoal.  We  may  prove  this  subgoal  by  applying  S3  to  get  the  subgoals 
stackok(<  < ’MOVEI  I <’C),UOTE  T>  > >)  and  stackok(<  < ’JRST  0 L2  > ...  L2  >).  The  first 
of  these  is  proved  by  S6  and  the  second  by  S8. 

1 he  remaining  subgoal  of  stackok(FCOMPEXP(s,M,LOCTABLE))  is  proved  by 
appealing  to  the  inductive  assumption. 

1 his  completes  the  proof  of  the  compiler  for  the  case  of  OR. 


A. 8.6  iSNOT  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISNOT(S).  1 he  code  produced  by  the  compiler  for  the  case  is; 

FCOMPEXP(S.M,  LOOT  ABLE)  « < ! FCOMPEXP(bl  .M.LOCTABLE) 

< ’JUMPN  I LI  > 

< ’MOVEI  I < 'QUOTE  ’T>  > 

< 'JRST  0 L2  > 

Li 

< ’MOVEI  1 0 > 

L2 


where  S is  of  form  < ’NOT  bl  >. 

The  assertion  1'  is  to  have  Hoare  rules  applied  for  these  statements.  First  we  note  that 
assenion(L2)  - T.  1 hen  we  apply  the  MOVEI  Floare  rule: 


Q 


Ri 

y 


< 'MOVEI  i y > } Q 


to  obtain 

|RI 
T 0 


Then  note  that 


assert ion(LI) 


RI 
T 0 
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Next  we  apply  the  JRST  Hoare  rule: 

assertion(l)  { < 'JRST  0 I > } Q 
to  obtain  T.  Now  apply  the  MOVEI  Hoare  rule,  resulting  in 
|RI 

T I < ’QUOTE  ’T> 

We  now  apply  the  JUMPN  Hoare  rule: 

(RlxO  -♦  assert lon(l))  a (RI-0  -»  Q)  { < ’JUMPN  1 I > } Q 
to  obtain; 

|R1  Rl 

(RI.«0  -»  T |o  ) a (RI-0  -*  T <’QUOTE  ’T>  ) 

We  assume  the  FCOMPEXP  properties  on  bl,  a smaller  portion  of  source  code  (the 
Inductive  assumption).  Then  distributing  the  substitutions  on  R I,  we  get: 

Pre(bl)  « A (Post(bl)  » -»  VR2. . . . ,RN(bl) 

|RI  |RI  Rl  Rl 

((bl  * ,1  0 -*  T |0  |bl  » ) A (bl  » - 0 -»  T <’QUOTE  ’T>  bl  * ))) 

By  subrule  12  and  subrule  4 we  may  drop  the  second  of  each  pair  of  Rl  substitutions.  We 
now  complete  the  verification  condition  generation  to  obtain: 

Rl 

Pre(S)  « A (Post(S)  * -*  VR2 RN(S)  ( T S * ))  -. 

Pre(bl)  « A (Post(bl)  * -♦  VR2, . . . ,RN(bl) 

Rl  Rl 

((bl  * 0 -*  T 0 ) A (bl*  » 0 -»  T <’QU0TE  ’T>  ))) 

We  now  expand  Pre(S),  Post(S),  and  N(S)  by  the  formulas  which  apply  for  this  case,  to 
get  (after  distributing  substitutions); 

Pre(bl)  « A ((bl  » - NIL  * •*  S * - <’QUOTE  ’T>  * ) a 
(bl  * .•  NIL  * •*  S « . NIL  «)  A Post(bl)  « -» 

|RI 

VR2 RN(bl)  (T  |s  * )) 
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Pre(bl)  ♦ A (Post(bl)  * -♦  VR2 RN(bl) 

Rl  Rl 

((bl  ♦ 0 -*  T 0 ) A (bl  * - 0 -♦  T <’QUOTE  ’T>  ))) 

Now  NIL*  is  0 and  <’Q,UOTE  T> » is  <’Q,UOTE  T>.  Since  Pre<bl)*  is  a 
hypothesis,  wc  may  eliminate  it  as  a conclusion.  We  will  break  this  into  two  cases:  that  of  bl  <= 
Nil,,  and  that  of  bl  Nil- 

Case  1:  bl  - NIL.  Then  bl  * - 0.  We  then  substitute  TRUE  or  FALSE  according  to 
the  case  I information,  then  logically  simplify  to  get: 

Pre(bl)  * A 

Rl 

(S  ♦ - <’qU0TE  •!>  A Post(bl)  * -»  VR2 RN(bl)  ( T S * )) 

Rl 

(Post(bl)  * -♦  VR2 RN(bl)  ( T <’QU0TE  *T>  )) 

which  is  obviously  TRUE. 

Case  2:  bl  .<  NIL.  Then  bl  ♦ 0.  In  a manner  similar  to  case  I we  get: 

Rl 

Prc(bl)  ♦A  (S  ♦ = 0 A Post(bl)  * -*  VR2 RN(bl)  ( T S * ))  ■♦ 

Rl 

(Postfbl)  » -♦  VR2 RN(bl)  (TO)) 

which  is  obviously  TRUE. 

We  now  must  prove  the  stackok  term  for  this  case.  We  apply  S3  to  obtain  the  subgoals 
stackok(FCOM PEX P(b I.M.LOCTABLE))  and 

stackok(<  < ’JUMPN  I LI  > 

< ’MOVE!  1 < ’QUOTE  ’T>  > 

< 'JRST  0 L2  > 

LI 

< ’MOVEI  I 0 > 

L2 

>) 

The  first  subgoal  is  part  of  the  inductive  assumption  that  these  properties  are  true  of  smaller 
parts  The  second  requires  S9. 
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T he  two  subgoals  thus  created  are  stackok(<  < ’MOVEI  I 0 > L2  >) 

stackok(<  < ’MOVF.I  1 <’QUOTE  ’T>  > 

< ’JRST  0 1,2  > 

LI 

< ’MOVEI  I 0 > 

1.2 

>) 

We  may  prove  the  first  subgoal  by  applying  S3,  then  proving  the  resulting  subgoals 
with  S6  and  S7  respectively.  I’hc  long  subgoal  above  Is  proved  by  applying  S3  to  obtain  the 
subgoals  srackok(<  < ’MOVEI  1 <’Q,UOTE  T>  >)  and  stackok(<  < ’JRST  0 L2  > ...  L2  >). 
1 he  first  is  proved  by  S6  and  the  second  by  S8. 

1 his  completes  the  proof  of  the  compiler  for  the  case  of  NOT. 


A. 8.7  ISCOND  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISC.ONI)(S).  We  will  pr  ove  it  in  two  subcases:  that  of  no  arguments,  and  that  of  one  or  more. 
1 he  code  produced  by  the  compiler  for  the  first  subcase  is: 

FrOMPEXP(S.M.LOCTABLE)  = < L5  > 


I'he  assertion  T is  to  have  the  Hoare  rule  applied  for  this  statement.  First  we  note  that 
asscrtion(L5)  = T.  Completing  the  verification  condition  generation  we  get: 


Pre(S)  * A (Post(S)  ♦ -»  VR2 RN(S)  (T 


Rl 

S * ))  -»  T 


We  now  expand  Pre<S),  Post(S),  and  N(S)  by  the  formulas  which  apply  for  this  subcase, 
and  let  S=UNDEFINED,  to  obtain: 


TRUE  ♦ A ((UNDEFINED^ UNDEFINED)  * T 


Rl 

UNDEFINED  * ) .♦  T 


This  verification  condition  would  simplify  to  TRUE  if  Rl  -•€  T.  Then  subrule  4 would  allow 
us  to  drop  the  substitution.  Indeed  this  is  what  we  mean  by  saying  that  the  value  of  S is 
undefined.  That  is,  the  proof  of  any  program  cannot  depend  on  the  value  of  a no  argument 
CONI),  which  is  exactly  what  Rl  represents  In  this  subcase.  Now  since  COND  is  defined 
recursively  in  terms  of  a COND  with  fewer  arguments,  this  means  that  no  proof  of  a program 
can  depend  on  the  value  of  a COND  that  executes  all  of  its  arguments  without  finding  a non 
NIL  one. 

The  code  produced  by  the  compiler  for  the  second  subcase  is: 


r 


V*' 
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FCOMPEXP(S.M,LOCTABLE)  - < ! FCOMPEXP(cl  .M.LOCTABLE) 

< ’JUMPE  1 L3  > 

! FCOMPEXP(dl. M.LOCTABLE) 

< ’JRST  L5  > 

L3 

! FCOMPEXP(<’COND  <c2  d2>  ...  <cn  dn». 
M.LOCTABLE) 


where  S is  of  form  < ’COND  <cl  dl>  <c2  d2>  ...  <cn  dn>  >. 

Assuming  the  FCOMPEXP  properties  on  < 'COND  <c2  d2>  ...  <cn  dn>  >.  a smaller 
portion  of  source  code  (the  Inductive  assumption),  we  get: 

Rl 

Pre(s)  * A (Post(s)  ♦ -♦  VR2 RN(s)  (T  s » )) 

where  s ■-  < ’COND  <c2  d2>  ...  <cn  dn>  >.  Now  we  note  that 

Rl 

assert  ion(L3)  •=  Pre(s)  » a (Post(s)  * VR2, . . . ,RN($)  (T  $ # )) 

We  wish  to  apply  the  JRST  Hoare  rule: 
assert ion( I ) { < ’JRST  0 I > } Q 

but  the  label  L5  lies  inside  the  final  FCOMPEXP.  By  Induction  on  n,  the  number  of 
arguments,  we  obtain: 

FCOMPEXP (S. M.LOCTABLE)  - < ... 

L5 


1 hus  assertion(L5)  is  always  T.  and  the  verification  condition  becomes  T.  We  again  assume 
the  FCOMPEXP  properties  on  a smaller  piece  of  source  code  to  obtain: 

Rl 

Pre(dl)  * A (Post(dl)  ♦ -*  VR2 RN(dl)  (T  dl  * )) 

We  now  apply  the  JUMPE  Hoare  rule: 

(Ri«=0  -*  assert  lon( I ) ) a (Ri»»0  -*  Q)  { < ’JUMPE  I I > } Q 


to  obtain: 
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Rl 

(R1=0  -»  Pie(s)  ♦ A (Post(s)  ♦ ->  VR2, . . . ,RN(s)  (T  s » )))  a 

Rl 

(Rl.'O  -»  Pre(dl)  * a (Post(cll)  » -♦  VR2 RN(dl)  (T  dl  * ))) 

Yet  again  wc  assume  the  FCOMPEXP  properties  on  a smaller  portion  of  source  code  to 

get: 

Pre(cl)  * A (Post(cl)  » -♦  VR2 RN(cl) 

Rl 

(((Rl=0  -*  Pre(s)  * a (Post(s)  * -*  VR2 RN(s)  (T  s * )))  a 

Rl 

(RI.'O  -»  Pre(dl)  * a (Post(dl)  * VR2 RN(dl)  (T  dl  * )))) 

Rl 

cl  ♦ )) 

Distributing  and  performing  the  last  R I substitution  we  get; 

Pre(cl)  » A (Post(cl)  * -♦  VR2 RN(cl) 

Rl 

((cl  » -^0  -»  Pre(s)  * cl  ♦ A 

Rl  Rl  Rl 

(Post(s)  * cl  * -»  VR2 RN(s)  (T  s * ) cl  * ))  a 

Rl 

(cl  » P'0  -*  Pre(dl)  * cl  * a 

Rl  Rl  Rl 

(Post(dl)  * cl  » -»  VR2 RN(dl)  (T  dl  » ) cl  ♦ ))  )) 

Now  k I -c  Pre(s).  Post(s).  Pre(dl).  Post(dl),  0.  and  the  stack  elements  of  w.  By  subrule 

3 R I -’C  Pre(s)  ».  Post(s)  ♦.  Pre(dl)  *.  and  Post(dl)  ».  Thus  subrule  4 allows  us  to  drop  the 
Rl  substitutions  on  Pre(s)  »,  Post(s)  *,  Pre(dl)»,  and  Post(dl)*.  A similar  argument  shows 
us  that  Rl  -c  cl  ♦ for  2 < I.  Thus  we  can  apply  subrule  18  to  move  the  cl  ♦ substitutions 
inside  the  Inner  V’s  Similarly  we  get  Rl  -’C  s * and  Rl  --c  dl  *.  Then  subrule  12  and  subrule 

4 allow  us  to  drop  the  substitutions  we  Just  moved  In.  The  result  is-. 
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Pre(cl)  * A (Post(cl)  * ■*  VR2. . . . ,RN(cl) 

Rl 

((cl  * =0  -»  Pre(s)  * A (Post(s)  * -♦  VR2, . . . ,RN(s)  (T  s * )))  a 

Rl 

(cl  ♦ .<0  -♦  Pie(dl)  * A (Post(cll)  * -»  VR2 RN(cll)  (T  dl  * )))  )) 

We  complete  the  verification  condition  generation,  then  expand  Pre(S)  and  Post(S)  by 
the  formulas  which  apply  for  this  subcase.  After  distributing  the  « substitutions  and 
performing  them  where  possible,  we  get; 

Pre(cl)  ♦ A (cl  » •<  0 -*  Pre(dl)  *)  a (cl  « « 0 -»  Pre($)  *)  a 
(Post  (cl)  » A (cl  » K 0 -*  Post(dl)  * /.  S « - dl  «)  A 

Rl 

(cl  ♦ = 0 -♦  Post(s)  » A S » - s *)  -»  VR2 RN(S)  (T  S *)) 

Pre(cl)  ♦ A (Post(cl)  » -♦  VR2 RN(cl) 

Rl 

((cl  ♦ eO  -»  Pre(s)  * A (Post(s)  * VR2, . . . ,RN(s)  (T  s * )))  a 

Rl 

(cl  * *0  -*  Pre(dl)  » a (Post(dl)  * -*  VR2, . . . ,RN(dl)  (T  dl  * )))  )) 

Since  Prc(cl)  « is  a hypothesis,  we  may  drop  it  as  a conclusion.  We  will  now  break  this 
into  two  cases;  that  of  cl  NIL,  and  that  of  cl  NIL. 

Case  I;  cl  ^NIL.  Then  cl  * - 0.  We  expand  N(S)  by  the  formula  which  holds  for  this 
subcase.  Substituting  1 KUE  or  FALSE  according  to  the  case  I information,  then  logically 
simplifying,  the  verification  condition  becomes; 

Pre(cl)  * A Prc(s)  * a 

(Post  (cl)  » A Post(s)  «aS«>s*-« 

Rl 

VR2 Rmax(N(cl),N(s))  (T  S «)) 
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(Post  (cl)  * -♦  VR2 RN(cl) 

Rl 

(Pre(s)  * A (Post(s)  * -♦  VR2, . . . , RN(s)  (T  s * )))) 

Since  for  all  i (2  5 i)  Ri  -’t  Prc(s),  Post(s),  0,  and  the  stack  elements  of  w,  we  may  apply  subrule 
3 to  conclude  that  Ri  -'C  Pre(s)  * and  Post(s) «.  Thus  we  may  apply  subruie  16  repeatedly  to 
move  the  outer  V inward  to  the  inner  one.  Then  apply  subruie  17b.  Since  Pre(s)  * is  a 
hypothesis,  wc  may  replace  it  with  TRUE  in  the  conclusion  and  logically  simplify.  The  result 
i.s; 

Pre(cl)  » A Pre(s)  * a 

(Post  (cl)  * A Post(s)  *aS*-s*-* 

RI 

VR2 Rmax(N(cl).N(s))  (T  S *)) 

-♦ 

RI 

(Post(cl)  * A Post(s)  * -»  VR2 Rmax(N(cl) ,N(s))  (T  s * )) 

which  is  obviously  1 RUE. 

Case  2:  cl  Nil,.  Then  cl  p*  0.  We  expand  N(S)  again,  then  replace  the  case 
information  by  TRUE  or  FALSE,  and  logically  simplify. 

Pre(cl)  * A Pre(dl)  ♦ a 

(Post  (cl)  * A Post(dl)  «AS*«>dl  ».* 

RI 

VR2 Rmax(N(cl).N(dl))  (T  S *)) 


(Post  (cl)  * -♦  VR2 RN(cl) 

RI 

(Pre(dl)  » A (Post(dl)  » ->  VR2 RN(dl)  (T  dl  * )))) 

Since  for  all  I (2  s i)  Ri  -•€  Pre(dl),  Post(dl),  0,  and  the  stack  elements  of  w,  we  may 
apply  subruie  3 to  conclude  that  Ri  -•(  Pre(dl) « and  Post(dl)  *.  Thus  we  may  apply  subrule 
16  repeatedly  to  move  the  outer  V inward  to  the  inner  one.  Then  apply  subrule  1 7b.  Since 
Pre(dl)*  is  a hypothesis,  we  may  replace  it  with  TRUE  in  the  conclusion  and  logically 
simplify.  I'he  result  is; 
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Pre(c I)  ♦ A Pre(dl)  * a 

(Post  (cl)  * A Post(dl)  *AS*«dl 


VR2 Rmax(N(cl)  ,N(dl))  (T 


Rl 

S *)) 


(Post(cl)  * A Post(dl)  * -»  VR2 Rmax(N(cl)  .N(dl))  (T 


Rl 

dl  * )) 


which  is  obviously  T RUE. 

Wc  now  must  prove  the  stackok  term  for  this  case.  This  we  will  also  do  by  the  two 
subcases.  For  the  no  argument  subcase  we  apply  S7. 

For  the  n argument  subcase,  we  will  first  apply  S3  to  obtain  the  subgoals 
stackok(FCOMPEXP(cl.M.LOCTABLE))and 


stackok(<  < ’JUMPE  I L3  > 

? FCOMPEXP (d I . M . LOCTABLE) 

< 'JRST  L5  > 

1.3 

! FCOMPF.XP  ( s . M . LOCTABLE) 

>) 

The  first  subgoal  is  part  of  the  inductive  assumption  that  these  properties  are  true  of 
smaller  parts.  The  second  requires  S9.  The  two  subgoals  thus  created  are 
stackok(FCOMPEXP(s.M.LOCTABLE))and 

stackok(<  ? FCOMPEXP (d I, M. LOCTABLE) 

< ’JRST  L5  > 

L3 

! FCOMPEXP ( s . M . LOCTABLE) 

>) 

The  first  subgoal  is  proved  by  appealing  to  the  inductive  assumption.  The  second 
requires  S3.  1 he  first  subgoal  so  produced  is  proved  by  the  inductive  assumption,  while  the 
second  requires  S8.  In  order  to  apply  SB,  we  have  to  recall  that  we  previously  proved  that 
FCOMPEXP(s,M,LOCTABLE)  will  end  with  L5. 

1 his  completes  the  proof  of  the  compiler  for  the  case  of  COND. 


A.B.8  ISQIIOTE  Case 

Wc  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
IS<;iUOTE(S).  The  code  produced  by  the  compiler  for  this  case  is; 

FCOMPEXP (S,M,L(X:T ABLE)  - < < ’MOVEl  1 <’(iU0TE  bl>  > > 
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where  S is  of  form  <’Q_UOTE  bl>. 

I'he  assertion  1'  is  to  have  the  Hoare  rule  applied  for  this  statement.  Applying  the 
MOVEI  Hoare  rule: 

Ri 

Q y { < ’MOVEI  I y > } Q 
we  get 
RI 

T < ’QUOTE  bl> 

Completing  the  verification  condition  generation  we  obtain: 

RI  RI 

Pre(S)  ♦ A (Post(S)  ♦ VR2 RN(S)  (T  S ♦ ))  -*  T <’QUOTE  bl> 

Now  expand  Fre(S),  Post(S),  N(S),  and  S by  the  formulas  which  apply  for  this  case. 

We  previously  showed  that  TRUE  » is  TRUE.  Then  simplifying  logically  we  get: 

RI  RI 

T <’QUOTE  bl>  » = T <’QU0TE  bl> 

Now  a Q^UOTE  expression  is  a constant  distinct  from  NIL  and  thus  substitution  for 
variables  and  NIL  in  the  » will  have  no  effect  on  it  by  subrule  4.  Thus  the  verification 
condition  is  'TRUE. 

To  piove  the  stackok  term  of  the  statement  of  correctness  we  simply  apply  S6.  This 
completes  the  proof  of  the  compiler  for  the  ISQ,UOTE  case. 

A.R.9  ISFUNCTIONCAI  L Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISFUNCT  lONCALL(S): 

ISEXPRESSION(S)  -» 

RI 

V Nil.  V NIL  v NIL 

( Pre(S)  w 0 A (Post(S)  w 0 -»  VR2 RN(S)  (T  S w 0 )) 

{ICOMPEXP(S.M.LOCTABLE)}  T ) 

A s t a c kok ( FCOMPEXP (S . M . LOCTABLE) ) 

where  l.OCTABLE  is  of  form  < < NAMEI  . LOCI  > ...  < NAMEr  . LOCr  > >. 
V - < NAMEI  ...  NAMEr  >.  w - < m[M+P+LOCll ...  m[M+P+LOCr]  >,  and  N(S)  is  a 


\M 


function  giving  the  maximum  number  of  registers  that  are  modified  during  execution  of  the 
compilation  of  S. 

We  have  that 

ISKUNCTIONCALL(S)  -*  N(S)  » max(N(f ) ,N(bl) N(bn)) 

where  S is  of  form  < f bl  ...  bn  >. 

To  prove  the  Hoare  rule  portion  of  the  compiler  correctness  we  will  apply  Hoare  rules 
to  the  1 for  the  statements  of  FCOMPEXP  for  the  particular  case  in  question  to  form  a 
verification  condition.  We  will  apply  simplifications  to  the  verification  condition  during  and 
after  generation  to  reduce  it  to  TRUE. 

1 Pt  S be  of  form  < f b 1 b2  ...  bn  >.  Then  we  may  express  the  code  produced  by  the 
compiler  for  this  case  by: 

FCOMPEXP(S,M.LOCTABLE)  = < ! FCOMPLlS(<bl  ...  bn>,M.LOCTABLE) 

! FLOADAC(l-n.l) 

< 'SUB  'P  <’C  0 0 n n>  > 

< 'CALL  n <’E  f>  > 


We  apply  the  following  Hoare  rule  for  a CALL  in  target  language: 


Entry  (f) 


NIL  al  .. 
0 Rl  . . 


(Exit(f) 


NIL  h 
0 <f  al 


an 

Rn  A 


. . an> 


a 1 
Rl 


an 

Rn  -» 


VR2 RN(f)  (Q) 


Rl 

<f  Rl  ...  Rn>  ) 


{ < ’CALL  n <’E  f>  > } Q 


where  N(f)  is  the  maximum  number  of  registers  modified  during  execution  of  function  f,  h is 
the  designation  in  Exit(f)  for  the  result  of  the  function  call  to  f,  and  al  ...  an  are  the  formal 
arguments  of  f. 

We  obtain: 


INIL 

a 1 

• > • 

Entry(f)  (0 

Rl 

iNIL  1 

h 

( Fxit(f) 

0 

<f  al 

an 

Rn  A 


an> 


a I 
Rl 


an 

Rn  -♦ 


VR2 RN(f)  (T) 


Rl 

<f  RJ  ...  Rn>  ) 
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We  apply  the  following  SUB  Hoare  rule: 
|P 

Q |p-n  { < ’SUB  ’P  <’C  0 0 n n>  > } Q 
resulting  in: 


INIL 

a 1 . 

, . 

( F.n  t r y ( f ) (0 

Rl  . 

NIL 

( Exit(f)  0 

|h 

|<f  al  . 

VR2 RN(f) 

(1) 

Rl 

<f 

an 

Rn  A 


. . an> 


al 

Rl 


Rl  ...  Rn> 


)) 


an 

Rn 

P 

P-n 


We  distribute  the  P substitution  to  the  three  terms.  Noting  that  Entry  and  Exit  are  in 
source  language,  and  so  contain  no  P,  we  apply  subrules  3b  and  4 to  drop  the  P substitution 
on  the  first  two  terms.  Applying  the  FLOADAC  rule  and  distributing  substitutions  we 
obtain: 


iNlL  1 

a 1 

an 

|RI 

iRn 

Entry(f) 

|o 

Rl 

Rn 

|mtP-n+I]  ... 

|in[P]  A 

iNIL  1 

Ih 

al  ... 

an 

Rl 

( F.xit(f) 

0 

<f  a 1 . . . 

an> 

Rl  ... 

Rn 

m[P-n+l]  . . . 

Rn 

m[P] 


Rl 

P 

Rl 

VR2.  . , 

. .RN(f)  (T) 

<f  Rl  . 

. . Rn> 

P-n 

mtP-n+l]  . . . 

Rn 

m[P]  ) 


Repeated  application  of  subrule  7a  allows  us  to  reorder  the  substitutions  on  the  Entry  to  the 
order: 


NIL 

a 1 

Rl 

an 

0 

Rl 

mlP-n+1]  ... 

Rn 

Rn 

m[P] 


Subrule  5 simplifies  this  to: 


NIL 

a 1 ... 

0 

m[P-n4l]  ... 

an 

m[P] 


Similarly  the  substitutions  on  Exit  become: 


NIL 

h 

a 1 

0 

<f  al  . 

. an> 

ni[P-n+l]  ... 

an 

m[P] 
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A|i|ilications  of  subiules  12,  3,  and  13  establish  that  R1  ->€  the  entire  formula.  We  may 
therefore  apply  the  FCOMPLIS  rule  to  obtain; 

V Nil  V NIL 

Pte(l»l)  w 0 A (l'ost(bl)  w 0 . 

V NIL  m V NIL  m 

Pic(bn)  w 0 f^(n-l)  a (Post(bn)  w 0 P(n-I)  -♦ 

Vlt2 Rm,ix(N(bl) N(bn)) 


))...) 


We  now  distribute  the  P+n  and  the  0(n)  substitutions,  then  apply  subrule  8 to  P^n  to 
move  it  inward  to  the  Lntiy  and  tite  F.xit.  1 he  P^n  substitutions  may  now  be  discarded  by 
subrulc  4,  On  the  T teim  we  may  similarly  move  the  P4n  in  as  far  as  the  P-n  subslitntiou. 
1 hen  use  subrulc  9 on  the  P+n  and  P-n,  and  simplify  arithmetically  to  obtain  a substitution 
of  P for  P,  which  may  be  discarded  by  subrule  H.  1 he  portion  of  the  formula  in  the  first  V 
is  now; 


Nl L a I ...  an  m 

(F.ntry(f)  0 in[P+l]  ...  mtP+n]  0(n)  A 


NIL 

h 

a 1 

an  ni 

( Fx  i 

t(f)  0 

<f  a 1 ... 

an>  nitP+l]  . . . 

m[P+n]  (J(n)  -* 

Ri 

RI 

. . . Rn  m 

VR2, 

. . . .RN(f) 

(1)  <f  RI 

...  Rn>  m[P+l] 

...  mlP+n]  (J(n)  )) 

Subrule  16  may  be  u.scd  to  move  the  V containing  this  expression  inward  to  the  inside  V 
term  (with  its  substitutions). 

Recall  that  the  a's  arc  the  formal  parameters  of  function  f.  The  a’s  -«  w,  which  is  a list 
of  elements  from  the  array  m.  Now  for  some  al,  either  ai  -t  b’s,  or  ai  e bj  for  one  or  more  j’s. 
In  the  first  ca.se,  subrulc  3 gives  us 
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and  therefore  the  a’s  -c  (3(n).  In  the  second  case,  a formal  parameter  has  simply  been  named 
the  same  as  a previously  declared  variable.  Thus  the  previous  declaration  will  cause  it  to 
appear  as  a name  in  v.  So  we  can  apply  subrule  12  at  that  point,  and  subrule  3 on  the 
substitutions  of  v made  after  that  point  to  show  that 


a i -•€  bk 


V NIL 
w 0 


and  therefore  the  a’s  -e  (J(n).  So  in  either  case  we  may  apply  subrule  8 repeatedly  to  p(n)  on 
both  the  F.ntry  and  Exit  terms.  Use  subrute  7 to  move  the  P(n)  past  the  NIL  and  h 
substitutions.  Subrule  4 allows  us  to  drop  the  0(n)  substitution  on  both  Entry  and  Exit. 
Recall  that  |3(n)[P+Jl  is 


bj 


V NIL 
w 0 


for  I < j < n.  1’he  R’s  -c  b’s  and  R’s  -c  w,  so  subrule  3 gives  us  R’s  -€  any  term  of  P(n),  and 
therefore  -c  (3(n).  I'hus  we  may  apply  subrule  8 on  (3(n)  on  the  last  term  repeatedly.  The 
above  part  of  the  formula  (including  the  outside  V)  is  now: 


NIL 

|al 

an 

En  t r y ( f ) 

0 

|bl  * ... 

bn  * 

A 

NIL 

al 

an 

(Exi t (f) 

0 

<f  al  ... 

an> 

bl  » ... 

bn  * -» 

VR2 ,Rmax(N(bl), 

N(bn)) 

( 

m 

|R1 

Rl  ... 

VR2 RN(f)  (T) 

fJ(n) 

<f  Rl  ...  Rn> 

bl  * ... 

Rn 

bn  » ) ) 


Recall  that  the  « notation  means  the  substitutions 


V NIL 
I w 0 

The  form 


xl  1x2 

yl  * |y2 

means 


xl 

yl 


V 

NIL 

w 

0 

y2 
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Apply  subrule  9 to  the  R I substitutions.  Now  apply  subrule  8 followed  by  subrule  19  to 
each  of  the  other  Ri’s  to  make  the  last  term  of  the  above  formula  become: 

VR2 Rmax(N(bl) N(bn))  ( 

m Rl 

VR2 RN(f)  (T  p(n)  ) <f  bl  » . . . bn  * > ) 

In  a manner  similar  to  showing  Ri  -’C  (i,  we  show  Ri  -’C  the  item  substituted  for  Rl. 
Subrulc  18  will  now  move  the  Rl  substitution  inside  the  inner  V,  and  then  subrule  I7b  can 

combine  the  two  V’s.  The  resulting  V includes  R2 Rmax(N(f),N(bl) N(bn)). 

Recall  that  f}(j)  is  the  same  as  m up  to  and  including  the  Pth  element.  Thus  the  fi 
substitutions  may  be  dropped  (by  subrule  I'l)  if  they  are  applied  to  an  expression  that  refers 
only  to  the  first  P elements  of  m.  Now  the  elements  of  m past  the  Pth  represent  the  values  of 
the  formal  parameters  of  f.  Thus  T and  all  the  Pre’s  and  Post’s  (after  the  substitutions  that 
change  source  language  names  to  target  language  elements  of  m)  will  indeed  not  use  m past 
element  P,  since  they  cannot  refer  to  the  formal  parameters  of  f. 

T o complete  the  verification  condition  generation,  we  now  add  the  terms  before  the 
in  the  FCOMPEXP  Moare  rule  as  a hypothesis  to  the  formula  we  have  simplified.  The 
result  is: 

Rl 

Pre(S)  * A (Post(S)  ♦ -♦  VR2 RN(S)  (T  S *))  -♦ 

Pre(bl)  ♦ A (Post(bl)  Pre(bn)  » a (Post(bn)  * -» 


NIL  al  ...  an 
Entry(f)  0 bl*...  bn*A 


NIL  h 

a 1 

an 

(Exit(f)  0 <f  al  ... 

an>  bl  * . . . 

bn  » .+ 

Rl 

VR2 Rmax(N(f),N(bl), 

...,N(bn))  (T 

<f  bl  * .. 

. bn  * > ) 

))  ...  ) 

We  now  expand  Pre(S)  and  Post(S)  in  the  hypothesis  by  the  formulas  which  apply  for 
the  ca.se  of  a function  call.  Let  S > <f  bl  ...  bn>. 

Distribute  the  v and  NIL  substitutions  (the  *)  over  the  terms  of  Pre(S)  and  Post(S). 
Now  substitute  TRUE  in  the  conclusion  for  occurrences  of  the  hypotheses,  then  simplify  the 
AND-IMPLY  structure,  resulting  in; 

Pre(bl)  * A ...  A Pre(bn)  ♦ a 


(Post(bl)  * A ...  A Post(bn)  » -»  Entry(f) 


al 

bl 


an 

bn 


V 

w 


NIL 

0 ) A 


♦ A . 

. . A Post (bn) 

* A 

a 1 . 

. . an  h 

l|v 

INIL 

bl  . 

. . bn  <f  bl  . 

. . bn>  1 w 

|o  -> 

R1 

RN(S) 

(T  <f  bl  ... 

A 

C 

* )) 

NIL 

al 

♦ A . 

. . A Post (bn) 

♦ -♦ 

Entry(f) 

1 0 

bl 

♦ A . , 

. . A Post (bn) 

* A 

NIL 

h 

al 

an 

0 

<f  al  ...  an> 

bl 

♦ . . . 

bn  * -» 

VR? Rmax(N(f),N(bl),....N(bn))  (T  |<f  bl  * 

Use  subrute  8 to  change  the  conclusion  Exit  term  to: 

(nil  |al  ...  Ian  ih 


bn  » >)) 


NIL 

al 

an 

0 

bl  « . . 

bn  » 

Then  use  subrule  8 to  change  the  hypothesis  Exit  term  to: 


a 1 . 

. an 

V NIL 

bl  . 

. bn 

w 0 

Apply  subrule  15  to  the  Entry  and  Exit  terms  in  the  hypotheses.  Since  Entry(f)  and 
F.xit(f)  contain  only  a’s  and  h (they  must  be  expressed  only  in  terms  of  the  formal  parameters 
and  result),  we  may  apply  subrule  12  and  subrule  3 to  establish  that  for  any  NAMEI  in  v. 


NAMEi  -•€  Entry(f)  |bl  ||w  ...  |bn  ||w 

which  resulted  from  subrulc  15.  I'hus  subrule  4 allows  us  to  drop  the  final  w substitution  on 
this  term,  and  similarly  drop  it  on  the  Exit  term  of  the  hypotheses.  Then  use  subrule  8 
repeatedly  to  move  the  NIL  substitution  in  on  both  the  Entry  and  Exit  terms  of  the 
hypotheses.  I'he  first  conclusion  now  exactly  matches  a hypothesis,  and  so  may  be  dropped. 
1 he  result  is: 

Pre(bl)  * A ...  A Pre(bn)  * a 


a trn 


Vi- 


MO 


(Pos t (bl)  * A . 

. A Post  (bn)  » -1 

(Pos t (bl ) * A . 

. A Post (bn)  * A 

NIL 

Lxit(f)  0 

a 1 

bl  * ... 

an 

bn  * 

VR2 RN(S) 

Rl 

(T  <f  bl 

...  bn 

(Pos  t (bl ) * A . . 

. A Post (bn)  * A 

NIL 

F.xit(f)  0 

a 1 ... 

bl  * ... 

an 

bn  * 

NIL 

a 1 ... 

0 

bl  « ... 

an 

bn  * ) A 


. bn  * > 


VR2 Rmax(N(f).N(bl) N(bn))  (T  |<f  bl  * ...  bn  » >)) 

Use  subrulc  6 repeatedly  to  distribute  the  NIL  and  the  items  In  v into  <f  bl  ...  bn>  in 

the  last  hypothesis.  Since  max(N(f).N(bI) N(bn))  is  N(S),  the  conclusion  matches  the  last 

hypothesis.  'I  he  Hoare  rule  for  the  function  call  case  is  proved. 

■|  he  other  term  which  we  must  prove  about  this  case  Is 
stackok(LCOMPF.XP(S,M,LOCTABLE)).  To  prove  it  for  the  function  call  case,  apply  S3 
and  S'!  to  the  target  code  produced  in  this  case.  This  gets  rid  of  the  CALL  statement.  By 
induction  on  n,  we  may  easily  derive  the  containspushes  property  of  FCOMPLIS: 


containspiishcs(KCOMPLIS(<  bl 


bn  >,  M,  LOCTABLE),  n) 


This  allows  us  to  apply  S2  to  get  a subgoal  of  stackok(FLOADAC(l-n,I)).  But  this  is  obvious 
by  S5  and  the  fact  that  FLOADAC  always  contains  only  MOVE  statements. 

1 his  completes  the  proof  of  the  compiler  for  the  case  of  a function  call. 


A.8.10  ISLAMBDA  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  where 
ISLAM BDA(S).  1 his  case  is  similar  to  the  function  call  case  because  a lambda  expression  is 
es.sentially  a call  to  an  unnamed  function.  The  major  differences  lie  in  passing  the  arguments 
in  the  stack  rather  than  the  registers,  and  in  having  the  code  of  the  lambda  compiled  in-line. 
1 he  former  difference  avoids  the  use  of  the  LOADAC  sequence  of  instruc.ions  in  the  calling 
code  and  the  MKPUSII  sequence  in  the  called  code,  while  the  latter  difference  avoids  the  use 
of  the  CALL  and  POPJ  instructions.  The  statement  of  correctness  is: 
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ISEXPRESSION(S)  -♦ 


V 

NIL 

V 

NIL 

Rl 

V 

NIL 

w 

0 A (Post(S) 

w 

0 VR2 RN(S)  (T 

S 

w 

0 )) 

{FCOMPEXP(S.M.LOCTABLE)}  T ) 


A s t a c kok ( FCOMPEXP (S . M , LOCTABLE) ) 

where  lOCl'ABLE  is  of  form  < < NAMEI  . LOCI  > ...  < NAMEr  . LOCr  > >, 
V - < NAMEI  ...  NAMEr  >,  w - < mtM+P+LOCl] ...  m[M+P+LOCr]  >.  and  N(S)  is  a 
function  giving  the  maximum  number  of  registers  that  are  modified  during  execution  of  the 
compilation  of  S. 

T o prove  the  Noare  rule  portion  we  will  apply  Hoare  rules  to  the  T for  the  statements 
of  FCOMPEXP  for  the  particular  case  in  question  to  form  a verification  condition.  We  will 
apply  simplifications  to  the  verification  condition  during  and  after  generation  to  reduce  it  to 
1 RUE 

I, el  S be  of  form  < <’LAMBDA  <al  ...  an>  exp>  bl  ...  bn  >.  Then  we  may  express  the 
code  produced  by  the  compiler  for  this  case  by: 

FCOMPEXP(S.M. LOCTABLE)  - < * FCOMPLIS(<bl  ...  bn>,M, LOCTABLE) 

! FCOMPEXP (exp , M-n , ADD I DS ( LOCTABLE , 

<al  ...  an>, 
l-M)) 

< ’SUB  ’P<’C00nn>> 


Using  the  SUB  Floare  rule: 


Q 


p 

P-n 


< ’SUB  ’P  <’C  0 0 n n>  > ) Q 


gives  us 


T 


P 

P-n 


Assuming  the  FCOMPEXP  properties  on  exp,  a smaller  portion  of  source  code  (the  Inductive 
assumption),  we  get: 


Pre (exp) 


v’  NIL 

w’  0 A (Post (exp) 


v’  NIL 

w’  0 -*  VR2 RN(exp) 


Rl 

P 

P-n  exp 


(T 


NIL 
0 )) 


1-12 


Wc  piime  the  v and  w simply  to  distinguish  them  from  v and  w used  earlier.  They  are 
different  tlian  the  earlier  ones  because  the  LOCTABLE  appearing  in  the  FCOMPEXP 
property  we  are  applying  is  now  to  have  the  value  of  ADDlDS(LOCTABLE,<al  ...  an>,l-M), 
and  M is  now  to  have  the  value  M-n.  From  the  code  of  the  compiler  we  have: 

AI)DlDS(LOCTABLE,<al  ...  an>.l-M)  - APPEND (PRUP (<a  1 ...  an>.l-M). 

LOCTABLE) 

We  show  in  the  proof  of  COMP  that  PRUP(<al  ...  an>,k)  « 

< <a I . k>  <a2  . k^l>  ...  <an  . k4n-l>  > 

1 hus  if  LOCTABLE  is  of  form 

< < NAME  I . LOCI  > . . . < NAMEr  . LOCr  > > 
then  Al)DlDS(LOCTABLE,<al  ...  an>.l-M)  = 

< <a  I l-M>  <a2  . 2-M>  ...  <an  . n-M> 

<NAMF.I  . L0C1>  ...  <NAMEr  . L0Cr>  > 

T hus  v’  = < al  a2  ...  an  NAMEI  ...  NAMEr  >,  and  w’ = < m[P-n+ 1]  m[P- 

n (2]  ...  m[P]  m[P+M-n+LOCl] ...  m[P+M-n+LOCr]  >. 

We  can  sec  that  Rl  --c  Pre’s,  Rl  -•*  Post’s,  R1  -<€  w’,  R1  -•€  exp,  and  R 1 -c  0.  Then  by 
siihriilc  3b,  siibrule  12,  and  subrulc  13  we  get  Rl  ->e  the  entire  formula  above.  Thus  we  may 
a|»ply  the  FC.OMPLIS  rule  to  it  to  get: 

Prc(bl)  * A (Post(bl)  * -♦ 

m tn 

Prc(b2)  ♦ (3(1)  A (Post(b2)  * (3(1)  -♦ 


ni  m 

Prc(bn)  ♦ (l(n-))  a (Post  (bn)  ♦ (3(n-l)  -♦ 

VR2 Rniax(N(bl) N(bn)) 

v’  NIL  v’  NIL 

((Pre(exp)  w’  0 a (Post  (exp)  w’  0 -> 

Rl 

P v’  NIL  P m 

VR2 RN(exp)  (T  P-n  exp  w’  0 )))  P4n  0(n)  ) 


) ...  )) 
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Wc  now  distribute  the  P4n  and  i3(n)  substitutions,  then  apply  subrule  8 to  P+n  to  move 
it  inward  to  the  Prc(exp)  and  Post(exp).  The  P4n  substitutions  may  now  be  discarded  by 
siibrulc  4.  Since  all  Ri’s  --e  w,  bj.  or  0,  subrule  3 yields  that  Ri's  -c  all  elements  of  P(n),  and 
therefore  Ri’s  -c  f?(n).  Now  subrule  18b  allows  us  to  move  the  P+n  and  p(n)  substitutions 
inside  the  inner  V.  Use  subrule  8 to  move  the  P+n  in  as  far  as  the  P-n.  Then  use  subrule  9 
oti  the  P+n  and  P-n,  and  simplify  arithmetically  to  obtain  a substitution  of  P for  P,  which 
may  be  discarded  by  subrule  14.  The  portion  of  the  formula  in  the  first  V is  now; 


v’ 

NIL 

tn 

v’ 

NIL 

Pre (exp) 

w’  ’ 

0 

j3(n)  A (Post  (exp) 

w” 

0 

VR2 RN(exp)  (1 


Rl 

exp 


NIL  P 
0 P+n 


m 

0(n)  )) 


where  w”  tr  < m[p^  |]  rn[P+2] ...  m[P+n]  mtP+M+LOCl] ...  m[P+M+LOCr]  >.  Similarly  we 
may  move  the  P+n  inward  to  the  exp,  then  discard  P+n.  The  w’  on  the  exp  will  of  course 
become  a w”.  Subrulc  16  may  be  used  to  move  the  V containing  this  expression  Inward  to  the 
inside  V term.  Now  apply  subrule  17b  to  make  the  above  part  of  the  formula  (including  the 
outside  V): 


v’ 

NIL 

m 

v’ 

NIL 

tn 

Pre (exp) 

w’  ’ 

0 

(J(n)  A (Post  (exp) 

w” 

0 

(3(n)  -♦ 

Rl 

1 

|v’ 

INIL 

VR2 Rmax(N(exp).N(bl) N(bn))  (T 


exp 


m 

a(n)  )) 


Apply  subrulc  8 to  move  the  (3  substitution  inward  to  the  T.  Since  for  alt  elements  X of 
v’  wc  have  X -c  (3(n),  the  latter  being  in  target  language,  we  can  also  use  subrule  8 to  move 
the  0 substitutions  inward  to  Pre(exp)  and  Post(exp).  They  can  then  be  dropped  by  subrule  4. 
Recall  that  (3(n)[P+j]  is 


NIL 

0 


for  1 < j < n,  and  fJ(n)(P+ j]  is  m[P+j]  for  -P  < j < 0.  Since  -M  is  the  sire  of  the  stack,  all 
M + l.OCi’s  arc  between  -P  and  0.  Thus  the  above  portion  of  the  formula  is: 


Pre(exp) 


(Pos  t (exp) 


1 a 1 ... 

an 

l|v 

1 

|bl  « . . . 

bn  « 

lin 

a 1 

Ian 

1 

\ 

') 

bl  * ... 

bn  * 

« 

NIL 
0 A 


NIL 

0 


VH? ,RiHax(N(exp).N(bJ) N(bn)) 


Rl 

m 

a 1 

an 

V 

(T 

0(n) 

exp 

bl  * ... 

bn  * 

w 

Recall  that  (J(j)  is  the  same  as  m up  to  and  including  the  Pth  element.  Thus  the  (3 
substitutions  may  be  dropped  (by  subrule  H)  If  they  are  applied  to  an  expression  that  refers 
only  to  the  first  P elements  of  m.  Now  the  elements  of  m past  the  Pth  represent  the  values  of 
the  formal  parameters  of  the  lambda.  Thus  T and  all  the  Pre’s  and  Post’s  (after  the 
substitutions  that  change  source  language  names  to  target  language  elements  of  m)  will  indeed 
not  use  m past  element  P,  since  they  cannot  refer  to  the  formal  parameters  of  the  lambda.  T 
could  have  resulted  from  an  assertion  that  mentions  the  lambda  expression  as  a whole.  But 
the  foimal  parameters  of  that  lambda  expression  are  not  free  uses  of  the  names,  and  so  will 
not  be  substituted  for  to  produce  elements  of  m past  P. 

We  will  use  N(S)  for  the  max  (they  are  equal  for  the  case  of  S being  a lambda).  Then 
to  complete  the  verification  condition  generation,  we  add  as  a hypothesis  the  terms  before  the 
in  the  Hoare  rule  we  are  trying  to  prove.  The  result  is: 


Pre(S)  * A (Post(S)  ♦ -»  VR2 RN(S)  (T 


Rl 

$*))-» 


Pre(bl)  ♦ A (Post(bl)  ♦ -* 


Pre(b2)  * a (Post(b2)  ♦ -» 


Pre(bn)  * a (Post  (bn)  * -♦ 
Pre(exp) 

(Pos  t (exp) 


a 1 ... 

an 

|v 

1 

bl  ♦ ... 

bn  * 

h 1 

a 1 

an 

\ 

> 

bl  » . . . 

bn  * 

K 

NIL 
0 A 


NIL 
0 -» 


Rl 

a 1 

an 

V 

VR2 RN(S)  (T 

exp 

bl  * ... 

bn  * 

w 

NIL 
0 )) 


) ...  )) 


We  now  expand  Pre(S)  and  Post(S)  in  the  hypothesis  by  the  formulas  which  apply  for 
the  case  of  a lambda.  Let 


a I ...  an 
bl  ...  bn 


S ••  exp 


H5 


Distribute  the  v and  NIL  substitutions  (the  »)  over  the  terms  of  Pre(S)  and  Post(S). 
Now  substitute  TRUE  in  the  conclusion  for  occurrences  of  the  hypotheses,  then  simplify  the 
AND-IMPLY  structure,  obtaining: 

Pre(bl)  * A ...  A Pre(bn)  * a 


a 1 . . 

. . an 

V 

(Pos  t (bl ) * A . . 

, . A Post  (bn)  * -»  Pre(exp) 

bl  . 

, . bn 

w 

NIL 

0 ) A 


a 1 . . 

. . an 

V 

(Post (bl)  * A . . 

, . A Post (bn)  * A Post (exp) 

bl  . 

. . bn 

w 

NIL 

0 


Rl 

a 1 . 

. . an 

V 

VR2, . 

. .,RN(S)  (T 

exp 

bl  . 

. . bn 

w 

al 

an 

V 

(Post (bl ) * A . . 

. A Post  (bn)  » -♦  Pre(exp) 

bl  » ... 

bn  * 

w 

Rl 

a 1 ... 

an 

V 

VR2,  . . 

. ..RN(S)  (T 

exp 

bl  * ... 

bn  * 

w 

NIL 
0 )) 


a 1 . 

. . an 

V 

X 

bl  . 

. . bn 

w 

We  will  now  work  with  the  series  of  substitutions 


NIL 

0 


First  we  apply  subrule  15  to  obtain: 

NIL 
0 


» 1 

an 

V 

X 

bl 

w . . . 

bn 

Now  apply  subrulc  7b  to  get; 


a 1 

. . . 

an 

V 

X 

bl 

w . . . 

bn 

NIL 

0 


a 1 ... 

an 

V 

(Pos t (bl ) ♦ A . . 

. A Post (bn)  * A Post (exp) 

bl  * ... 

bn  » 

w 

NIL 
0 ) 

NIL 

0 


Apply  subrulc  8 repeatedly  yielding: 


a I ...  an  v 

NIL  V NIL.  V NIL 

XO  bl  wO  ...  bn  wO  w 

Now  by  subrulc  12  wc  have 

|v  INIL 

NIL  -c  bi  (w  |o 

So  we  can  now  apply  subrule  7b  to  obtain: 


V NIL 


a 1 

an 

V 

NIL 

V 

X 

bl 

w 

0 

bn 

w 

or. 

iKsing 

the 

’ * notation: 

al 

an 

V 

NIL 

X 

bl  » 

bn  * 

w 

0 

Uy  replacing  the  series  of  substitutions  with  this  equivalent  in  the  three  places  they  occur  in 
the  verification  condition,  we  obtain  conclusions  that  exactly  match  hypotheses,  so  the  Hoare 
rule  for  the  lambda  case  is  proved. 

'I  'o  prove  the  stackok  part  of  this  case,  we  recall  that  we  proved 

containspushes(FCOMPLIS(<bl  ...  bn>,M,LOCTABLE)  ,n) . 

Wc  then  apply  axiom  S2  to  get  the  subgoal  of 

stackok (FCOMPEXP(exp,M-n,ADDlDS(LOCTABLE,<al  ...  an>,l-M)). 

Itut  this  is  1 RUE  by  inductive  hypothesis. 

I'his  proves  the  lambda  case. 


A. 8.1 1 Function  Definition  Case 

We  here  prove  the  statement  of  correctness  of  the  compiler  for  the  case  of  a function 
definition  (ISFUNCl'IONDEF(S)  is  TRUE): 
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ISFUNCTIONDEF(S)  -> 


NIL 

a 1 

(Entry(f) 

0 

RI’  ... 

h |N1L 

al 

{ FCOMP(S)  } Exit(f) 

RI  |0 

RI’  ... 

A stackokreturns(FCOMP(S)) 

where  S ■=  < 'UE  f < al  ...  an  > exp  >,  h Is  the  designation  used  in  £xit(f)  for  the  function 
value  returned  by  the  function  f,  and  Ri*  is  the  initial  value  of  register  Ri. 

For  S in  this  form,  we  have: 

FCOMP(S)  - < < M AP  f ’SUBR  > 

! FMKPUSIKn,  I) 

! FCOMPEXP(exp,-n.PRUP(<al  ...  an>,l)) 

< ’SUB  ’P  <’C  0 0 n n>  > 

< ’POP!  ’P  > 

’NIL 


We  will  apply  Moare  rules  to  the  right-hand  portion  of  the  Hoare  rule  part  of  the 
statement  of  correctness  for  the  statements  of  FCOMP(S)  In  order  to  prove  it.  The  ’NIL  is 
Just  an  end  marker  for  the  compiled  code,  and  so  may  be  Ignored.  We  apply  the  POPJ 
Noare  rule  (essentially  a RETURN  rule): 


h 

NIL 

a 1 

Exit (f) 

RI 

0 

RI’  ... 

an 

Rn’ 


{ < ’POPJ  ’P  > } Q 


where  f is  the  function  in  which  we  find  the  POPJ  Instruction  and  h and  the  a’s  are  as 
previously,  to  obtain  the  same  result. 

Using  the  SUB  Hoare  rule: 


Q 

gives  us; 

Exit (f) 

Since  P -c  Exit(f),  nor  0,  nor  any  registers,  we  can  apply  subrule  3b  to  establish 


an 

Rn’ 


{ < 

’SUB 

’P  <’C  0 0 n 

n>  > 

h 

NIL 

a 1 

an 

P 

RI 

0 

RI’  . . 

. Rn’ 

P-n 

h 

NIL 

a 1 

P -c  Exit(f) 

RI 

0 

RI’  ... 

II  a 


Siihnilr  4 allows  us  to  then  drop  the  F-n  substitution. 

We  now  use  the  FCOMFEXP  part  of  the  statement  of  correctness  Inductively, 
l OC  I ABLE  will  be  FRUP(<al  ...  an>,l),  M will  be  -n.  and  S will  be  exp  In  using  the 
I f'.OMPEXP  properties.  From  the  code  of  PRUP  we  have. 

PRUP(<al  ...  an>.k)  . < <al  . k>  ! PRUP(<a2  ...  an>.k+l)  > 


luduciion  on  n gives  iis: 

PHUP(<al  ...  aii>,k)  ^ < <al  . k>  <a2  . k+l>  ...  <an  . k^n-l>  > 

1 or  the  ca.se  where  k ■-  I,  we  have; 

PKUI’(<al  ...  an>,l)  ■ < <al  . I>  <a2  . 2>  ...  <an  . n>  > 

Thus  the  v of  the  FCOMPEXP  properties  is  <al  ...  an>,  and  w is  <m[-n+P+ 1]  ...  m[P]>. 
T hus  verification  condition  generation  back  through  FCOMPEXP  gives  us: 


Pre (exp) 


a I 

an 

miP-n+l]  . . . 

m[P] 

a I 

an 

mlP-n+l]  . . 

ni[P] 

NIL 
0 A 


NIL 
0 -♦ 


VR2 RN(exp) 


(Exi l (f) 


h 

NIL 

a I 

an 

Rl 

Rl 

0 

RI*  ... 

Rn’ 

exp 

al 

mtP-n+l]  . . . 


an 

m[P] 


NIL 
0 )) 


For  all  i,  I < i < n we  may  apply  subrule  12  at  the  point  where  we  substitute  for  ai  on 
exp.  1'hen  apply  subrule  3b  for  the  remaining  substitutions  on  exp  to  get 


Rl 

a I 

an 

a i -c 

exp 

mlP-n+l)  . . . 

mlP] 

NIL 

0 


I'hcn  we  may  apply  subrule  7b  to  rearrange  the  Exit  term  to; 


Rl 

NIL 

al 

h 

a I 

an 

NIL 

Fxit(f) 

Rl 

exp 

in[P-n+l]  ... 

m[Pl 

0 

0 

Rl’  ... 

an 

Rn’ 


Since  R I -c  Exit(f),  we  may  apply  subrule  5 to  obtain: 


M9 


h 

a 1 ... 

an 

NIL 

NIL 

al 

Lx  i t ( f ) 

exp 

m[P-n+l]  . . . 

m[P] 

0 

0 

RI’  ... 

an 

Rn’ 


For  all  Ri  (2  i i s N(exp))  we  may  apply  subrule  3b  to  show  that 


NIL 

0 


a 1 

an 

m[P-m+l]  . . . 

mtP] 

then  af;ain  to  show  that  Ri  -'C  the  quantified  expression.  We  can  therefore  drop  the 
quantifier  by  subrule  16c. 

Apply  the  FMKPUSH  rule  to  the  entire  expression,  then  distribute  those  FMKPUSH 
substitutions  over  the  three  terms.  On  both  the  Pre  and  Post  terms  we  may  apply  subrule  8 
repeatedly  to  each  of  the  FMKPUSH  substitutions  to  move  them  Inward  to  Pre(exp)  or 
Post(exp).  Since  P Prc(cxp)  or  Post(exp)  and  m -c  Prc(exp)  or  Post(exp),  we  may  drop  all 
the  FMKPUSH  substitutions  directly  on  Pre(exp)  and  Post(exp)  by  subrule  4.  The 
explanation  at  the  end  of  the  FMKPUSH  rule  derivation  shows  us  that  the  m’s  result  in  Ri’s 
when  the  FMKPUSH  substitutions  are  applied  to  them.  Thus  we  have: 


a 1 ... 

an 

Pre (exp) 

RI  . . . 

Rn 

NIL 
0 A 


a 1 ... 

an 

(Post (exp) 

RI  ... 

Rn 

NIL 
0 -♦ 


h 

a 1 

an 

NIL 

NIL 

al 

Exi t (f) 

exp 

m[P-n+ll  ... 

in[P] 

0 

0 

RI’  ... 

an 

Rn’ 


P 

m ... 

P^n 

a(m,P-*n,Rn)  . . . 

a(m,P+I.RI)  ) 


l/.se  subrule  7b  on  each  of  the  FMKPUSH  substitutions  to  move  them  inward  past  the 
NIL  substitution.  1 hen  use  subrule  8 to  move  each  FMKPUSH  substitution  in  past  the  h 
substitution.  I hen  apply  subrule  4 to  drop  all  FMKPUSH  substitutions  directly  applied  to 
Kxit(f).  Now  we  still  have  the  FMKPUSH  substitutions  applied  to  the  end  of  the  exp  phrase. 
We  can  apply  subrule  8 to  move  the  FMKPUSH  substitutions  inward  to  exp,  at  which  point 
subrule  4 allows  us  to  drop  them.  I'he  action  of  subrule  8 Just  applied  has  resulted  In  the 
FMKPUSH  substitutions  being  applied  to  the  m’s,  and  we  have  previously  shown  that 
changes  the  m’s  to  Ri’s.  The  result  is; 
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Pre (exp) 


a I . . 

an 

Rl  .. 

. Rn 

NIL 

0 


A 


a I ... 

an 

(Post (exp) 

Rl  ... 

Rn 

h 

a I ... 

an  iNIL 

NIL 

al  ... 

F.xit(f) 

exp 

Rl  ... 

Rn  |o 

0 

Rl’  ... 

Ignoring  the  LAP  statement  as  simply  an  externally  available  label,  we  add  the  Entry 
hypothesis  and  drop  the  primes  (initial  value  is  same  as  present  value  here)  to  complete  the 
verification  condition  generation,  obtaining: 


Entry(f) 


NIL  al  . . . 
0 Rl  . . . 


an 

Rn  -* 


Pre (exp) 


a I . . . 
Rl  . . . 


an  NIL 
Rn  0 A 


a I ... 

an 

(Post (exp)  I 

Rl  . . . 

Rn 

NIL 

0 


Exit  (f) 


h 

a I ... 

an 

NIL 

NIL 

a I ... 

exp 

Rl  . . . 

Rn 

0 

0 

Rl  ... 

Apply  subrule  19  at  the  NIL  substitutions  in  the  Exit  term.  We  may  then  use  subrule 
7b  to  rearrange  the  a I substitution  to  the  place  before  the  NIL,  and  apply  subrule  20  again. 
Similarly  we  can  apply  subrule  20  to  all  the  substitutions  on  exp,  then  arrange  the  substitution 
of  the  a’s  back  to  their  original  order.  I'hen  use  subrule  7b  to  place  all  NIL  substitutions  at 
the  end  of  their  terms.  Subrule  6 applied  then  gives: 


(F.ntry(f)  -*  Pre(exp)  a (Post  (exp)  -♦  Exit(f) 


h 

a I ... 

an 

exp  )) 

Rl  ... 

Rn 

NIL 

0 


Put  the  quantity  in  the  outermost  parentheses  must  be  TRUE  in  order  to  have  a proof 
of  the  source  program  in  the  source  language.  Since  I'RUE  with  substitutions  is  still  TRUE, 
we  have  proved  the  Moare  rule  part  of  the  case  of  a function  definition. 

1 o prove  the  stackokreturns  property  we  use  axiom  Si,  which  reduces  the  problem  to 
that  of  proving: 

stackok(  < ! FMKPUSil(n,  I ) 

! FCOMPEXP(exp.-n.PRUP(<al  ...  an>.l)) 

< 'SUB  'P  <’C  0 0 n n>  > 

> ) 
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Note  that  we  are  still  ignoring  the  LAP  and  NIL  for  the  reasons  given  above.  By  induction 
on  n it  is  easy  to  obtain  from  axioms  Cl  and  C3: 

containspushes (FMKPUSH(n, I) , n) 

We  may  now  apply  axiom  S2,  obtaining  as  the  subgoal  stackok(FCOMPEXP(exp,- 
n,PRUP(<al  ...  an>,l))).  But  this  may  be  assumed  by  the  inductive  step. 

This  concludes  the  proof  of  the  properties  of  FCOMP. 

A.8.12  COMPLIS 

We  here  derive  a Hoare  rule  to  describe  the  action  of  the  instructions  represented  by 
FCOMPLIS.  From  the  code  of  COMPLIS  we  have: 

FCOMPLIS(<bl  ...  bn>,M,LOCTABLE)  - 

< ! FCOMPEXP(bl.M.LOCTABLE) 

< ’PUSH  ’P  I > 

! FC0MPLIS(<b2  ...  bn>,M-l .LOCTABLE) 


if  n > 0,  else  an  empty  list  is  the  result.  Induction  on  n shows  us  that 

FC(>ilPLIS(<bl  ...  bn>.M.  LOCTABLE)  - < ! FCOMPEXP  (b  I ,M,  LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP (b2 , M- I . LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP (bn , M-n+ 1 , LOCTABLE) 

< ’PUSH  'P  I > 


Since  we  aie  proving  the  compiler  by  induction  on  the  syntactic  structure  of  the  source 
language,  we  may  assume  the  FCOMPEXP  result  on  smaller  pieces  of  code,  such  as  the  b’s. 
We  apply  this  result  and  the  following  PUSFI  rule  n times: 

m P 

Q a(m,P,Ri)  P+l  { < 'PUSH  'P  1 > ) 
resulting  in: 

V NIL  V NIL 

Pre(bl)  W(0)  0 A(Post(bl)  W(0)  0 -»  VR2 RN(bl)  (( 

V NIL  V NIL 

Pre(b2)  W(-I)  0 A(Post(b2)  W(-l)  0 -*  VR2, . . . .RN(b2)  (( 
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Pre (bn- 1 ) 

Prc(bn) 


V NIL 

W(2-n)  0 A (Post(bn-l) 


V 

W(2-n) 


NIL 

0 ->  VR2 RN(bn-l)(( 


V NIL  V 

W(l-n)  0 A (Post (bn)  W(l-n) 


NIL 

0 -»  VR2 RN(bn)  ( 


RI 

m 

P 

Q 

a(in,P,RI) 

P+l 

bn 

RI 

m 

P 

> 

a(m,P,RI) 

P+l 

bn-l 

RI 

m 

P 

) 

a(m,P,RI) 

P+l 

b2 

RI 

m 

P 

) 

a(m,P,RI) 

P+l 

bl 

V 

W(l-n) 


NIL 
0 )) 


V 

W(2-n) 


NIL 
0 )) 


V 

W(-l) 


V 

W(0) 


NIL 
0 )) 


NIL 
0 )) 


where  W(k)  = < ni[M+P+k+LOCI] ...  m[M+P+k+LOCrl  >. 

1 he  W notation  is  necessitated  by  the  fact  that  the  M handed  to  COMPEXP  is 
different  at  each  call.  We  will  eventually  have  only  W(0)  terms  and  be  able  to  use  the  lower 
case  w,  which  is  equal  to  W(0)  by  definition. 

Distribute  the  a,  P+l,  and  bi  etc.  substitutions  onto  the  Pre  and  Post  terms.  For  all  Ri 
(i  > 2)  we  have  that  Ri  W(k),  RI  -c  0,  Ri  -c  «(m,P,RI).  Ri  -c  P4  I,  Ri  -t  bj.  Ri  -c  Pre(bJ), 
and  RI  -c  Post(bJ).  Application  of  subrule  3b  to  first  all  the  bj  substitutions,  then  to  the 
substitutions  on  all  the  Pre*s  and  Post’s  results  in  Ri  -•€  any  Pre  or  Post  term  with  all 
substitutions  indicated  having  been  applied.  We  may  now  apply  subrules  16a  and  16b  to 
obtain: 


NIL 


NIL 


Prc(bl)  1 

|w(o)  |o  A 

(Post(bl)  1 

|W(0)  |o 

•4 

RI 

V 

NIL 

m 

P 

V 

Pre(b2) 

W(-l) 

0 

a(in,P,RI) 

P+l 

bl 

W(0) 

(Post(b2) 


V 

W(-l) 


NIL 

0 


m 

a(m,P,Rl) 


P 

P+l 


RI 


bl 


V 

W(0) 


NIL 
0 A 


NIL 
0 + 


Rl 

V NIL  m P V NIL 

Prc(bn-l)  W(2-n)  0 a(m,P.Ri)  P+1  bn-2  V»(S-n)  0 a 

Rl 

V NIL  m P V NIL 

(Post(bn-I)  W(2-n)  0 a(m.P.RI)  P+1  bn-2  W(3-n)  0 -» 

VR2 RN(bn-2)  (( 

Rl 

V NIL  m P V NIL 

Pre(bn)  W(l-ii)  0 a(m.P,Rl)  P+1  bn-1  W(2-n)  0 a 

Rl 

V NIL  m P V NIL 

(Post  (bn)  W(l-n)  0 o(m.P,RI)  P+I  bn-1  W(2-n)  0 + 

VR2 RN(bn-l)  (( 

Rl 

Im  P V NIL 

VR2 RN(bn)  ((^  |a(m.P,Rl)  P+I  bn  W(I-n)  0 ) 

Rl 

Im  |P  V NIL 

) |a(m,P.RI)  |P4|  bn-I  W(2-n)  0 )) 

Rl 

m P V NIL 

) a(m.P.RI)  P+I  bn-2  W(3-n)  0 )) 


Rl 

m P V NIL 

) «(m.P.RI)  P+1  bl  W(0)  0 ))) 

By  alternating  this  process  of  distributing  substitutions  onto  Pre  and  Post  terms,  then 
subrulcs  16a  and  16b  (with  a similar  justification  involving  subrule  3),  we  eventually  arrive  at; 
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NIL 


NIL 


Pre(bl)  1 

|W(0)  |( 

) A (Post(bl) 

|w(o>  |o 

Rl 

I 

1 V 

NIL  |m  1 

P 

1 

V 

Pre(b2)  | 

|w('i) 

0 |a(m,P,RI)  1 

P+1 

bl  1 

W(C) 

NIL 
0 A 


(Post (b2) 


V 

NIL 

W(-l) 

0 

a(m,P,Rl) 


Rl 

P 

V 

P+I 

bl 

Vl(0) 

NIL 
0 -♦ 


llv  INIL  Im 

P 

Rl 

V 

Prc(bn-I)  ||w(2-n)  |o  |o(m.P.RI) 

P+I 

bn-2 

W(3-n) 

NIL 

0 


m 

a( 

1 

INIL  1 

(Post (bn-1) 

|w(2-n) 

0 

R1 


bl 


V NIL 
W(0)  0 A 


m 

a(m,P,RI) 


Rl 

P 

V 

P+I 

bn-2 

W(3-n) 

Rl 

NIL 

0 


m P 

a(m.P,Rl)  P+I 


bl 


V NIL 

W{0)  0 -» 


Prc(bn) 


V NIL  m 

W(l-n)  0 o(m.P.RI) 


P 

P+I 


Rl 


bn- 1 


V 

W(2-n) 


NIL 

0 


a(in,P,RI) 


P 

P+I 


Rl 


bl 


V NIL 
W(0)  0 A 


(Post (bn) 


NIL  |m 


Ip 

Rl 

||v 

|p+l 

bn-l  ||w(2-n) 

NIL 

0 
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R1 

m P V NIL 

a(m.P.RI)  P4l  bl  W(0)  0 -♦ 

VR2 RN(bl)  ( VR2 RN(b2)  ( ... 

Rl 

m P I V INIL 

VR2 RN(bn-I)  (VR2 RN(bn)  (Q  a(m,P,RI)  P+J  bn  | W(l-n)  |o  ) 

Ri 

m P |v  NIL 

a(in.P.RI)  P+l  bn-l  |w(2-n)  0 ) 


Rl 

m P V NIL 

a(m.P.RI)  P+l  b2  W(-l)  0 ) 

Rl 

m P V NIL 

a(in.P.RI)  P+l  bl  W(0)  0 ) 

))  ...  )) 

Use  subrule  8 to  place  every  P+l  substitution  just  inside  the  corresponding  a 
substitution,  resulting  in  all  a's  becoming  a(m,P+l,RI).  We  must  now  make  the  assumption 
that  R I Q..  This  assumption  must  be  verified  before  applying  the  rule  for  COMPLIS  that 
we  are  deriving  here.  Using  the  facts  Rl  --e  Pre’s,  Rl  -€  Post’s,  Rl  -e  W(k),  and  Rl  -e  0,  we 
can  apply  subrules  12  and  3 to  show  that  in  all  places  of  the  form 

Rl 

m V NIL 

X a(m,P+l,RI)  bi  W(l-i)  0 

it  holds  that  R I -^c  X.  So  we  apply  subrule  5 in  all  such  places  to  get: 

m I V NIL 

X a(m,P+l,bi  | W(l-l)  0 ) 

We  may  now  apply  subrule  8 repeatedly  to  all  P+I  substitutions  In  Pre  or  Post  terms 
until  they  have  moved  in  next  to  the  innermost  P+l.  This  will  require  us  to  apply  a P+j 
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substitution  to  W(k).  We  now  show  that  this  results  in  W(k+j).  First  recall  that  W(k)  means 

< mfM4p4k+LOCI] ...  m[M+P+k4LOCr]  > . Application  of  the  substitution  gives  us 

< m[M4P4j4k4LOCI] ...  m[M4P4j4k4LOCr]  > , which  is  W(k4j).  Thus  using  subrule  8 
repeatedly  gives  us: 

V P P I V 

X W(k)  P4j  = X P4j  I W(k4j) 

Additionally,  If  P -•€  X,  we  may  drop  the  P4j  substitution  on  the  right,  by  use  of  subrule  4. 
We  now  use  subrule  9 repeatedly  in  all  places  of  the  form 

P P ...  P 

X P4l  P4l  ...  P4l 

where  the  substitution  occurs  r times  (r  2 2)  and  simplify  arithmetic  to  obtain 
P 

X P4r 

Subrule  7 allows  us  to  Interchange  the  order  of  all  the  NIL-0  and  P4j  substitutions.  We  then 
apply  the  P4j  substitutions  to  the  W(k)  substitutions,  as  Justified  above.  The  P4j’s  get 
dropped  then  since  P -t  Pre’s  or  Post’s.  The  result  is: 

V NIL  V NIL 

Pre(bl)  W(0)  0 a (Post(bl)  W(0)  0 4 

V NIL  III  V NIL 

Pre(b2)  W(0)  0 a(m,P4l,bl  W(0)  0 ) a 

I V INIL  Im  |v  NIL 

(Post(b2)  I W(0)  |o  |o(iii,p4l.bl  |w(0)  0 ) 4 


v NIL  m V NIL 

Pre(bn-I)  W(0)  0 a(m,P4n-2.bn-2  W(0)  0 ) 


m I V InIL 

a(m,P4l,bl  I W(0)  |o  ) a 

V NIL  m V NIL 

(Post(bn-I)  W(0)  0 a(m,P4n-2,bn-2  W(0)  0 ) 
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m V NIL 

«(m.P4l,bI  W(0)  0 ) -» 

V INIL  m V NIL 

Pre(bn)  W(0)  |o  a(m.P+n-I .bn-I  W(0)  0 ) 


m V NIL 

a(m.P+I,bI  W(0)  0 ) a 

|v  NIL  m V NIL 

(Post  (bn)  |w(0)  0 o(m.P+n-l.bn-I  W(0)  0 ) 


m V NIL 

a(m.P+I,bI  W(0)  0 ) -♦ 

VR2 RN(bl)  ( VR2 RN(b2)  ( ... 

Pm  V NIL 

VR2 RN(bn-I)  (VR2 RN(bn)  ((J  P+I  a(m,P+I.bn  W(I-n)  0 ) ) 

Pm  V NIL 

P+I  a(m, P+I, bn-I  W(2-n)  0 ) ) 


Pm  V NIL 

P+I  a(m.P+l,b2  W(-I)  0 ) ) 

|P  Im  llv  iNIL 

|P+I  |a(m,P+I.bl  ||W(0)  |o  ) ) 

))  ...  )) 

Use  of  subrule  3 allows  us  to  conclude  for  any  I 
V NIL 

Ri  -e  a(m.P+l,bJ  W(l-J)  0 ) 

1 his  allows  us  to  apply  subrule  18b  repeatedly  to  reduce  the  part  of  the  formula  after  the 
Post(bn)  term  to: 


VR2 RN(bl)  ( VR2 RN(b2)  ( ... 
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VR? RN(bn-l)  (VR2 RN(bn)  ( 

|P  |m  I V NIL. 

Q |p+l  (a(m.P+l.bn  ( W(l-n)  0 ) 

IP  |m  llv  INIL 

|p+l  |a(m.P+I.bn-l  l|w(2-n)  |o  ) 


Pm  V iNlL 

P+l  a(m.P+l,b2  W(-l)  (0  ) 

P |m  I V NIL 

P4l  |a(m,P+I,bI  I W(0)  0 ) 

))  ...  )) 

We  now  apply  subrule  1 7b  repeatedly  to  the  for  all’s.  By  the  same  method  as  was  used 
on  the  Pre  and  Post  terms,  we  can  consolidate  the  P+l  substitutions.  This  will  result  in  all  the 
W’s  being  W(0),  so  we  may  use  lower  case  w In  place  of  them.  The  above  part  becomes: 

VR2 Rmax(N(bl) N(bn))  ( 


Note  that  the  series  of  substitutions 
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ni 

V 

«(iM,  P^  i . bi 

w 

ni 

V 

a (ni,  P4  1 . bl 

w 

Nil. 

0 ) 


NIL 
0 ) 


is  equivalent  to  replacing  m by  an  array  in  which  the  P+lst  element  has  been  replaced  by 


bl 


NIL 

0 


the  P^2nd  clement  by 


b2 


NIL 

0 


. , and  the  1*4  ith  clement  by 


bi 


NIL 

0 


For  the  sake  of  smaller  formulas,  we  will  call  such  an  array  j3(i).  Using  this  notation  we  have 
the  COMl’L.lS  rule; 


V 

INIL 

1 

V 

iNIL 

Pre (bl ) 

w 

|0  A (Post(bl)  1 

w 

0 

-» 

V 

NIL 

1 

1 

V 

NIL 

Prc(b2) 

w 

0 

ft(l)  A (Post(b2) 

w 

0 

m 

P(l)  -* 


V 

Nil. 

ni 

V 

NIL 

Pre (bn- 1 ) 

w 

0 

<l(n-2)  A (Post(bn-I) 

w 

0 

V 

NIL 

m 

V 

NIL 

Pre(bn) 

w 

0 

fi(n-l)  A (Post (bn) 

w 

0 

m 

0(n-2) 


m 

0(n-l)  -♦ 


VR2 Rmax(N(bl) N(bn))  ( Q 


P 

P+n 


m 

a(n)  ) 


))  ...  )) 

{ FCOMPl.IS(<bl  ...  bn>.M.LOCTABLE)  ) Q 
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if  R I -€  Q,. 


A.8.13  MKPUSH 

Here  we  derive  a Moare  rule  to  describe  the  action  of  the  instructions  represented  by 
FMKPUSH.  From  the  code  of  MKPUSH  we  have: 

FMKPUSH(n,m)  - < < ’PUSH  ’P  m > 

! FMKPUSH (n.DH I ) 


if  n ^ m,  else  an  empty  list  is  the  result.  Induction  on  m shows  us  that 

FMKPUSH(n.m)  - < < ’PUSH  ’Pm  > 

< ’PUSH  ’P  iihI  > 

< ’PUSH  ’P  n > 


for  n 2 m.  For  the  case  of  m - I,  we  have: 

FMKPUSH(n,l)  - < < ’PUSH  ’P  I > 

< ’PUSH  ’P  2 > 

< ’PUSH  ’P  n > 


for  n 2 I.  Using  the  Hoare  rule  for  PUSH; 
m P 

Q a(m.P.Ri)  P+I  { < ’PUSH  ’P  I > } Q 

wc  obtain  the  effect  of  MKPUSH  as: 

m P |m  Pm  P 

Q o(m.P.Rn)  P+I  |o(m,P,Rn-l)  P+I  ...  a(m,P.RI)  P+I 

{ FMKPUSH (n. I ) } Q 

We  may  apply  subrule  8 repeatedly  and  simplify  arithmetically  to  get: 

P . . . P m m 

Q P+I  ...  P+I  o(m.P+n,Rn)  ...  o(m.P+I.RI)  { FMKPUSH(n. I)  } Q 

We  apply  subrule  9 repeatedly  and  simplify  arithmetically  to  get: 
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P I m m 

Q P4n  |a(m,P+n.Rn)  ...  «(m.P+l.RI)  f FMKPUSH(n.l)  } Q 

In  applying  this  rule,  we  will  be  doing  so  on  a containing  the  term  m[P-n+i],  where  i 
is  a constant.  The  first  substitution  of  the  MKPUSH  rule  will  produce  mtP+il.  Now  for  all 
of  the  a substitutions  made  before  a(m,P-ti,Ri),  we  will  be  able  to  apply  the  simplification  rule; 

a(m.j,x)[k]  1=  m[k]  if  j k k 

This  gives  mtP+i]  still.  The  substitution  where  J » k uses  the  simplification  rule: 

j.x)  [j]  = X 

1 his  gives  Ri.  Further  substitutions  beyond  the  ith  o may  be  ignored  by  virtue  of  subrule  4. 
Thus  applying  the  MKPUSH  rule  to  m[P-n+i]  gives  Rl. 


A.8.14  LOADAC 

I Icre  we  derive  a Hoare  rule  to  describe  the  action  of  the  Instructions  represented  by 
FLOADAC.  From  the  code  of  LOADAC  we  have: 

FLOADAC(m,k)  - < < ’MOVE  k m 'P  > 

• FLOADAC  (uHl.k-f  I ) 


if  m < 0,  else  an  empty  list  is  the  result.  Induction  on  m shows  us  that 

FLOADAC(m,k)  - < < ’MOVE  km  ’P  > 

< ’MOVE  k-tl  khI  ’P  > 

< 'MOVE  k-m  0 ‘P  > 


if  m s 0.  For  the  case  where  m • l-n  (n  > 0)  and  k • I we  have: 

FLOADAC ( I -n. I ) - < < ’MOVE  I l-n  'P  > 

< ’MOVE  2 2-n  'P  > 

< ’MOVE  n 0 ’P  > 


Using  the  Hoare  rule  for  MOVE: 
iRi 

Q |mtP+J]  { <’M0VE  I J ’P  > ) Q 
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we  obtain  the  effect  of  LOADAC  as; 


Rn 

Rn- 1 

Q 

m[P] 

mCP-l]  ... 

Rl 

m[P-n+l]  { FLOADAC(l-n, I)  } Q 


The  R’s  are  obviously  distinct  from  each  other,  and  -<t  the  stack  references  (the  m’s).  We  may 
therefore  apply  subrule  7b  to  reverse  the  order  of  substitution.  The  FLOADAC  rule  is  then; 


Rl 

R2 

Q 

mtP-n+l] 

m[P-n+2)  ... 

Rn 

in[P]  { FLOADAC (1-n,  I ) } Q 


A.9  Example  Equivalence  Proof 

I'he  following  Is  a proof  of  the  semantic  equivalence  of  the  output  produced  by  the 
compiler  CO  and  that  produced  by  the  compiler  C4  for  the  source  language  syntactic  case  of  a 
function  call.  C4  i.s  an  optimizing  version  of  CO  which  was  proved  by  London  [London 71]. 
MCO,  the  modification  of  CO  that  is  in  this  dissertation,  produces  the  same  code  for  this  case 
as  CO.  T he  equivalence  proof  is  presented  here  to  demonstrate  one  approach  to  proving 
compilers  with  optimizations.  Those  interested  in  the  details  of  this  proof  are  referred  to  the 
above-referenced  report  for  the  listing  of  C4,  explanations  of  how  it  works,  and  the  lemmas 
later  referenced  here.  The  terminology  used  here  continues  in  the  vein  used  in  the  MCO 
proof,  though  the  methods  more  closely  approximate  London’s. 

First  we  will  write  the  result  of  CO  for  a function  call,  as  obtained  during  the  MCO  part 
one  proof.  We  will  assume  that  the  input  is  of  form  <f  bl  b2  ...  bn>. 

FCOMPEXP(<f  bl  b2  ...  bn>,»l,LOCTABLE)  - 
< ! FCOMPLIS(<bl  b2  ...  bn>,M,LOCTABLE) 

! FLOADAC (I -n. I ) 

< ’SUB  ’P<  ’C00nn>> 

< ’CALL  n < ’E  f > > 

> 

We  will  now  expand  the  FCOMPLIS  and  FLOADAC  terms  by  the  use  of  the  forms  we 
derived  during  the  MCO  proof. 
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FCOMPEXP(<f  bl  b2  ...  bn>.M.LOCTABLE)  - 
< ! FCOMPEXP(bl.M.LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP(b2.M.LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP(bn.M.LOCTABLE) 

< ’PUSH  ’P  1 > 

< ’MOVE  I 1-n  ’P  > 

< ’MOVE  2 2-n  ’P  > 

< ’MOVE  n 0 ’P  > 

< ’SUB  ’P<  ’COOnn>> 

< ’CALL  n < ’E  f > > 


We  will  now  hand  execute  this  code  to  determine  its  effects.  The  first  item  of  code,  the 
FCOMPEXP  of  bl,  produces  the  value  of  argument  bl  in  register  RI,  at  the  possible  expense 
of  wiping  out  the  other  registers.  We  will  represent  this  in  London’s  trace  notation,  in  which 
values  are  written  to  the  right  of  target  variable  names,  then  "crossed  out"  by  appending  an 
asterisk  when  a new  value  takes  its  place.  In  this  notation  we  have  so  far: 

RI:  bl 

R2;  undefined 
RN:  undefined 

where  N is  the  number  of  the  last  register  available  for  target  code  use.  We  will  now  add  the 
stack  to  our  notation  with  a separate  line  for  each  location  used  in  the  stack.  We  now  execute 
the  first  push  to  obtain: 

RI:  bl 

R2:  undefined 

RN:  undefined 
stack:  bl 


After  executing  through  the  last  push  we  have. 


Rl : bl*  b2«  ...  bn 

R2:  undefined*  undefined*  ...  undefined 


IS4 


RN:  undefined*  undefined*  ... 
stack:  bl 
b2 

. undefined 

bn 

Executing  the  moves  places  the  values  in  the  stack  into  the  registers,  while  the  SUB  removes  n 
items  from  the  stack,  resulting  in: 

Rl:  bl*  b2*  ...  bn*  bl 

R2:  undefined*  undefined*  ... 

undefined*  b2 

Rn:  undefined*  undefined*  ... 

undefined*  bn 

RN:  undefined*  undefined*  ... 
stack: 

undefined 

Because  the  result  of  compiling  on  C4  ends  with  the  identical  call  statement,  we  will  not 
execute  the  call.  We  will  now  derive  the  form  of  the  C4  result  and  execute  it  (except  for  the 

call)  to  show  that  the  same  state  exists  in  the  target  machine  as  after  executing  the  CO  results. 
I'his  is  the  sense  in  which  we  consider  the  results  semantically  equivalent. 

It  is  obvious  from  the  code  of  C4  that  its  output  for  this  case  is: 

FCOMPEXP(<f  bl  b2  ...  bn>.M,LOCTABLE)  - 
< ! FCOMPLISA(<bl  b2  . . . bn>,M,LOCrABLE) 

< 'CALL  n < 'E  f > > 

> 


Because  COMPEXP,  the  main  compiling  routine,  has  been  redefined  in  C4,  FCOMPEXP  is 
of  course  redefined  from  that  of  CO.  The  same  will  be  true  of  COMPLIS  and  LOADAC,  so 
we  must  rederive  forms  output  by  those  routines. 

We  may  obtain  an  expression  for  COMPLISA  directly  from  its  code,  and  expand 
FCOMPLISA  to  obtain; 
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FCOMPEXP(<f  bl  b2  ..  bn>.M.LOCTABLE)  - 
< ! ICOMPLlS(CLASSIFY(<bl  b2  . . . bn>) .M. I .LOCTABLE) 
! FI,OADAC(Cl.ASSIFY(<bl  b2  . . . bn>), 

l-CCOUNT(CLASSlFY(<bI  b2  . . . bn>)), 

I. 

M-CCOUNT(CLASSlFY(<bl  b2  . . . bn>)), 
lOCTABLE) 

» SUBSTACK  (CCOUNT(CLASSlFY(<bl  b2  . . . bn>))) 

< ’CALL  n < ’E  f > > 


We  know  by  London’s  Lemma  7 that  dassify(<bl  b2 ...  bn>) 

< <d  I . bl>  <d2  . b2>  ...  <dn  . bn>  >,  where  di  is  given  by  the  syntactic  type  of  the 
corresponding  bi  according  to  this  table: 

di  bi  type 


0 T,  NIL,  numeric  atom 

1 other  atom 

2 quoted  expression 

? CAR-Cr»R  chain  ending  in  an  atom 

i other  expression  (except  last) 

5 last  other  expression 

By  Lemma  8 we  know  that  CCOUNT(<  <dl  . bl>  <dl  . b2>  ...  <dn  . bn>  >)  is  the  number  of 
di’s  that  are  1,  which  we  shall  call  Z4.  Let  e be  an  array  in  which  we  place  the  subscripts  of 
the  d’s  that  arc  4.  1'hcn  de[l]  is  the  first  4,  de[2]  is  the  second,  and  de[Z4l  Is  the  last  (If  any 
d’s  arc  4).  Note  that  the  notation  de[|]  means  di  where  l*=e[l].  Let  Z5  be  0 if  no  d’s  are  5,  else 
74  i I.  Then  let  e[Z5]  be  the  subscript  of  the  d that  is  5 (if  it  exists.)  We  may  expand 
SUBSTACK  by  use  of  Lemma  9.  With  this  terminology,  the  C4  result  of  compiling  a 
function  call  is: 

FCOMPEXP(<f  bl  b2  ...  bn>,M, LOCTABLE)  - 

< ! FCOMPLIS(<  <dl  . bl>  <d2  . b2>  ...  <dn  . bn>  >,  M , I . LOCTABLE) 

? FLOADAC(<  <dl  . bl>  <d2  . b2>  ...  <dn  . bn>  >, 

I -74, 

I. 

M-  Z4 . 

LOCTABLE) 

< ’SUB  ’P  < ’C  0 0 Z4  Z4  > > 

< ’CALL  n < ’E  f > > 

> 

unless  Z4  is  zero,  in  which  case  the  SUB  is  omitted. 


irir*-  « 
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We  must  now  derive  expressions  for  FCOMPLIS  and  FLOADAC.  From  the  code  of 
COMPLIS  we  have: 

FCOMPLIS(<  <dK  . bK>  <dK+I  . bK+l>  ...  <dn  . bn>  >,M,K.LOCTABLE)  - 
If  null  first  argument  then  < > 

Else  if  dK  - 4 then  < ! FCO«PEXP(bK.»(.LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPLIS(<  <dK+l  . bK+I>  ...  <dn  . bn>  >, 
M-I.K^l.LOCTABLE) 

> 

Else  if  dK  > 5 then  < ! FCOMPEXP(bK,M,LOaABLE) 

< ’MOVE  K I > 

> 

Else  < ! FCOMPLIS(<  <dK+l  . bK+l>  ...  <dn  . bn>  >.M,K+I  .LOCTABLE)  > 

unless  K is  I,  in  which  case  the  move  is  omitted. 

Theorem  A:  With  the  terminology  given  above.  FCOMPLIS  is  given  by: 

FCOMPLIS(<  <dK  . bK>  <dK+l  . bK+I>  ...  <dn  . bn>  >.M.K,LOCTABLE)  - 
< ! FCOMPEXP(be[ll.M,LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP(be [2] . M- I . LOCTABLE) 

< ’PUSH  ’P  I > 

! FCOMPEXP  (be  [Z4]  , M-Z4+ 1 . LOCTABLE) 

< ’PUSH  ’P  I > 

• FCOMPEXP (be  [Z5] . M-Z4 . LOCTABLE) 

< ’MOVE  etZ5]  I > 

> 

unless  Z5  is  I,  in  which  case  the  move  is  omitted.  Of  course  If  Z4  is  zero,  the  first  2»Z4  terms 
are  vacuous,  and  if  Z5  is  zero,  the  entire  expression  is  vacuous. 

I'he  proof  of  theorem  A is  a straightforward  application  of  induction  on  the  structure 
of  the  first  argument  of  COMPLIS.  It  is  proved  by  the  cases  reflected  in  the  code  of 
COMPLIS.  It  might  be  pointed  out  that  ell)  must  mark  the  first  occurrence  of  a type  4 in 
ttie  range  from  K to  n,  not  I to  n. 

1 he  way  in  which  LOADAC  optimizes  target  code  is  that  It  generates  all  arguments 
possible  In  the  register  that  they  are  to  occupy  at  the  time  the  function  is  called.  The 
restriction  is  that  the  argument  must  not  disturb  other  registers,  since  they  may  already  have 
other  arguments  In  them.  It  Is  exactly  the  types  which  have  d’s  of  3 or  less  that  may  be 
generated  with  the  use  of  only  one  register.  Thus  LOADAC  will  generate  exactly  the  same 
code  for  each  of  these  types  that  a recursion  to  the  main  compiling  routine  would  except  that 
the  register  to  be  used  Is  changed  according  to  which  argument  we  are  compiling.  We  will 
therefore  define  a new  function  to  describe  the  code  output  by  LOADAC  for  these  types.  We 
will  call  it  K FCOMPEXP,  and  define  it  as  follows. 
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KFCOMPF.XP(K.  EXP.  M.  LOOT  ABLE)  - 

If  DTYPE(EXP)  - 1 then  < <’»IOVE  K M4CDR(ASS0C(EXP.L0CTABLE))  *P>  > 

Else  If  DTYPE(EXP)  « 0 then  < <’MOVEI  K <’QUOTE  EXP»  > 

Else  if  DTYPE(EXP)  - 2 then  < <’MOVEI  K EXP>  > 

Else  if  DTYPE(EXP)  . 3 then  < ! REVERSE(COMPC(EXP.K.M.LOCTABLE))  > 

where  DTYPE(EXP)  is  the  value  of  d that  would  result  from  applying  classify  to  a list 

containing  EXP. 

lemma  A:  KFCOMPEXP  gives  a result  that  is  semantically  equivalent  to 
FCOMPEXP  over  the  types  for  which  the  former  is  defined,  except  that  KFCOMPEXP  uses 
only  register  k instead  of  only  register  I. 

The  proof  is  by  source  language  syntactic  type  cases.  For  ISNIL(EXP)  we  have 
DTYPE(EXP)  - 0.  KFCOMPEXP  and  FCOMPEXP  are  given  by  (respectively): 

< <‘M0VEI  K <’ QUOTE  EXP»  > 

< <’M0VEI  1 0>  > 

Because  0 and  NIL  (or  quote  NIL)  are  treated  the  same  in  the  target  language,  this  case  is 
proved.  For  IST(EXP)  we  have  DTYPE(EXP)  - 0.  KFCOMPEXP  Is  the  same  as  the 
previous  case,  while  FCOMPEX-P  Is: 

< <’M0VEI  1 <’(iU0TE  EXP»  > 

which  Is  the  same  except  for  the  register.  For  ISNUMBER(EXP)  we  have  DTYPE(EXP)  - 0. 
KFCOMPEXP  and  FCOMPEXP  are  both  the  same  as  In  the  previous  case.  For 
ISIDENTIFIER(EXP)  we  have  DTYPE(EXP)  - I.  The  results  are: 

< <’M0VE  K M+CDR(ASSOC(EXP,LOCTABLE))  ‘P>  > 

< <'mWl  1 M4CDR(  ASSOC  (EXP.  LOCTABLE))  ’P>  > 

For  ISQ.UOTE(EXP)  we  have  DTYPE(EXP)  - 2.  The  results  are 

< <’M0VEI  K EXP>  > 

< <’M0VEI  I EXP>  > 

For  ISCARORCDR(EXP)  we  have  DTYPE(EXP)  » 3 if  EXP  is  a CAR-CDR  chain  ending 
in  an  atom,  but  DTYPE(EXP)  > 3 otherwise.  We  may  ignore  cases  where  DTYPE(EXP)  > 3 
because  KFCOMPEXP  is  not  defined.  For  a CAR-CDR  chain  of  form 
(CfJIR  (C(J2R  ...(C(JNR  o)...))  with  each  0i  either  an  A or  a D (to  form  CAR  or  CDR),  we  may 
use  the  expansion  of  Theorem  9 to  get  for  KFCOMPEXP: 
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< <’IUNRZa  K M+CDR(ASSOC(a.LOCTABLE))  ’P> 
<’H«N-1RZ9  K K> 

<’H€2RZa  K K> 

<’HelRZe  K K> 


where  ei  is  L if  (ii  is  A and  is  R if  0i  is  D (to  form  HLRZe  or  HRRZa).  Examination  of  the 
code  of  C4  and  a simple  induction  on  the  depth  of  nesting  gives  us  the  following  expression 
for  FCOMPEXP. 

< <’H€NRZa  I M+CDR ( ASSOC (a.LOCTABLE))  ’P> 

<’HcN-IRZ8  I l> 

<’Hc2RZe  I l> 

<’HclRZa  I l> 


Since  all  other  syntactic  types  have  DTYPE  > 3,  we  have  proved  the  lemma. 

We  now  derive  the  expression  for  FLOADAC.  From  the  code  of  LOADAC  we  have: 

FLOADAC(<  <dN2  . bN2>  <dN2+l  . bN2+l>  ...  <dn  . bn>  >. 

M2.N2.M.L0CTABLE)  - 
If  null  first  argument  then  < > 

Else  if  dN2  < 3 then  < ! KFC0MPEXP(N2.bN2.M.L0CTABLE) 

! FLOADAC(<  <dN2+l  . bN2+l>  ...  <dn  . bn>  >, 
M2.N2+l.M,LOCTABLE) 

> 

Else  if  dN2  ■=  5 then  < ! FLOADAC(<  <dN2+l  . bN2+l>  . . <dn  . bn>  >, 

l,N24l.M,L0CTABLE) 

> 

Else  < <’IHOVE  N2  M2  ’P> 

? FLOADAC (<  <dN2+l  . bN2+l>  ...  <dn  . bn>  >. 

M24l,N2-rl,M,L0CTABLE) 


Theorem  B:  With  the  terminology  given  above,  FLOADAC  is  given  by: 
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FLOADAC<<  <clN2  . bN2>  <dN2+l  . bN2+l>  ...  <dn  . bn>  >. 

M2.N2.M.1.0CTAIILE)  - 

< ? KFCOMrFXP(N2.bN2,M.LOCTABLE) 

• Kl  COMPEXP (N24  I . bN24 1 . M,  LOCTABLE) 

<’MOVF  e[l]  M2  ’P> 

<‘MOVE  c[2]  M24 1 ■P> 

<’MOVF  c[7/i]  M24Z4-1  ’P> 

» K I COMPFX P (n . bn . M . LOCTABLE ) 

> 

with  the  elZb]  line  (if  any)  missing.  The  interpretation  of  this  is  that  the  KFCOMPEXP 
forms  occur  for  every  value  of  N2  up  to  n except  for  the  values  e[i]  ...  etZ4],  at  which  the 
move  form  occurs  instead.  It  should  be  understock  that  e[l]  must  mark  the  first  occurrence  of 
a type  4 in  the  range  from  N2  to  n. 

The  proof  of  theorem  B is  a straightforward  application  of  induction  on  the  structure 
of  the  first  argument  of  C-OMPLIS.  It  is  proved  by  the  cases  reflected  in  the  code  of 
COMPI.IS. 

We  may  now  substitute  these  forms  into  our  expression  for  the  result  of  compiling  a 
function  with  C4.  The  result  is: 

FCOMPF.XP(<f  bl  b2  ...  bn>.M. LOCTABLE)  - 

< ! FCOMPFXP (bet I ].M. LOCTABLE) 

< ’PUSH  ’P  I > 

? FCOMPFXP (be (21 . M- 1 . LOCTABLE) 

< ’PUSH  ’P  I > 

» I (;OMPEXP(bc  IZ4] . M-Z441 . LOCTABLE) 

< ’PUSH  ’P  I > 

* FCOMPFXP (be [ z 5] , M- Z4 , LOCTABLE) 

< ’MOVE  etZ5)  I > 

! KFCOMPEXP ( I . b 1 , M- Z4 . LOCTABLE) 

! KFCOMPEXP(2.b2.M-Z4. LOCTABLE) 


f'.i 
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< ’MOVF.  e[l]  1-/4  ’P  > 

< ’MOVt  cf2]  2-Z4  ’P  > 

< ’MOVF  c[7<l]  0 'P  > 

* K I COMPFXP (n , bn . M-Z4 , 1 OCTABLE) 

< 'sui;  ’P  < ’C  0 0 Z4  Z4  > > 

< 'CAM  n < ’F  f > > 


Willi  the  ef/.S]  line  missing  from  the  kfcompexps,  and  if  Z5  = I then  the  first  move  is  omitted. 
1 he  e's  include  all  type  is  between  I and  n. 

We  will  now  execute  this  code  (again,  except  for  the  final  call)  to  determine  if  it  has  the 
Siirne  final  values  in  the  trace  as  the  CO  code  did.  1 he  trace  just  before  executing  the  SUB 
instruction  is: 

III:  befll*  br[2]*  ...  bc[Zi]*  bctZ5)*  bl 

R2:  unde  fined*  undefined*  ...  undefined*  undefined*  b2 

Red];  undefined*  undefined*  ...  undefined*  undefined*  be[l] 

Re [2]:  undefined*  undefined*  ...  undefined*  undefined*  be [21 

Re[Zi]:  undefined*  undefined*  ...  undefined*  undefined*  be[Z4] 

Re(Z5>]:  undefined*  undefined*  ...  undefined*  undefined*  be[Z5] 

Rn:  undefined*  undefined*  ...  undefined*  undefined*  bn 

RN;  undefined*  undefined*  ...  undefined*  undefined 
stack:  be[l] 
be  [2] 

belZ/IJ 

'I  he  SUB  instruction  will  remove  Z4  items  from  the  stack  and  will  leave  the  machine  state 
exactly  as  we  saw  for  the  CO  code. 


A. 10  Siibrule  Jiislificalion 

In  this  section  we  will  Justify  some  of  the  sequential  substitution  simplification  rules 
(subrules)  by  use  of  a few  basic  principles,  and  give  the  vein  in  which  the  proof  of  the  others 
may  be  approached  1 he  siibrules  are  given  in  Section  S.6  In  the  forms  found  to  be  useful 
during  the  part  two  proofs. 

1'hc  definition  of  the  notation 


is  the  substitution  of  the  expression  H for  all  free  occurrences  of  the  Identifier  D in  the 
formula  Q,  with  the  caution  that  M must  not  have  free  uses  of  any  identifier  that  become 
bound  when  introduced  into  Q^.  As  in  the  previous  discussion  of  substitution,  all  D’s  and  X’s, 
numbered  or  not,  will  represent  atomic  identifiers. 

1 owards  the  goal  of  simplifying  such  substitutions,  we  wish  to  define  D not-in  A 
(denoted  D A)  as  meaning  that  there  are  no  free  uses  of  the  identifier  D In  the  expression 
A.  The  complication  that  arises  is  that  in  all  the  part  two  proofs  we  express  expressions  such 
as  A symbolically  In  terms  of  the  compiler  variables,  not  In  the  source  or  target  language  terms 
that  we  would  wish  to  use  in  A.  Thus  we  may  not  inspect  A to  see  if  indeed  a target  or 
source  language  variable  (identifier)  appears  in  A,  but  must  use  the  subrules  to  prove  that  it 
does  or  docs  not. 

The  basic  principle  we  will  use  to  prove  the  not-in  subrules  is  that  an  identifier  D is 
not-in  the  whole  if  it  is  not-in  the  parts  of  a target  language  or  source  language  expression. 
Formally  stated,  this  is; 

D --c  Ci  (for  I < i < n)  -*  D -•<  (f  CJ  . . . Cn) 

for  all  source  language  functions  f.  Those  familiar  with  Lisp  will  recogniie  the  (f  G1  ...  Cn) 
as  the  source  language  (of  MCO)  notation  for  a function  call.  Since  there  is  no  translation  of 
function  names  between  source  and  target  language,  f Is  also  a target  language  function.  This 
is  subrule  13b,  and  It  forms  the  basis  for  proving  most  of  the  other  not-in  subrules.  The 
converse  of  subrule  13b  is  also  true  and  will  occasionally  be  used  under  the  name  subrule  I3a. 

We  should  note  that  subrule  1 3b  precludes  the  possibility  that  D can  somehow  be 
created  by  the  combination  of  Items  that  do  not  individually  contain  D.  This  is  why  D must 
be  an  atomic  identifier,  not  an  expression.  Because  we  will  use  not-in  frequently  in  our 
substitution  rules,  we  require  that  the  object  which  we  substitute  for  is  also  atomic. 

If  II  represents  a target  language  or  source  language  expression  that  is  atomic,  then  the 
determination  of  D -c  H is  equivalent  to  the  observation  that  D x H.  This  follows  directly 
from  the  definition  of  not-in,  and  may  be  expressed  by: 

Dl  X D2  -*  Dl  -€  D2 

Dl  -^c  D2  -*  Dl  X D2 

The  first  is  called  subrule  1 1,  and  the  second  (its  converse)  will  be  called  subrule  I lb.  It  must 
be  understood  that  a statement  such  as  P x A does  not  mean  that  P does  not  have  the  same 
value  as  A,  but  that  the  identifier  P is  not  the  identifier  A for  purposes  of  deciding  if  a 
substitution  Involving  P need  be  made  on  an  expression  using  the  identifier  A.  The 
statement  P x A is  the  equivalent  of  the  Lisp  expression  (NOT  (EQ,  'P  *A)). 

Subrule  1 1 is  usually  applied  to  identifiers  that  are  proved  unequal  by  subrule  2,  which 
is  simply  a recognition  of  the  fact  that  source  language  variables  are  distinct  objects  from 
target  language  ones,  and  that  various  classes  of  target  language  variables  are  distinct  from 
each  other. 
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Before  we  proceed,  we  need  to  investigate  quantification.  It  is  introduced  into  the 
source  language  only  by  the  assertions,  but  is  also  introduced  Into  the  resulting  verification 
conditions  by  Moare  rules  with  quantification  in  them.  By  the  definition  of  not-ln,  we  can  see 
that  if  an  identifier  is  not-ln  an  expression.  It  is  not-in  (as  a free  use,  recall)  the  same 
expression  but  with  an  Identifier  bound  by  quantification.  Formally  expressed,  we  have; 

li.XvIi^G-*D-^VX(C) 

and 

D - X V D G - D -^c  3X  (G) 

These  are  the  parallels  of  subrule  ISb,  but  fw  quantification  Instead  of  functions.  We  will 
can  them  axioms  A I and  A2,  tespeetively  Note  that  the  convene  of  the  quantification  rules 
fs  not  true  without  finther  conditions  The  converses  wnh  condMons  wiV  be  caMtd  axlons 
and  B?,  and  may  be  expressr  as 

D-^VX(G)ADeX-»0-«G 

D^3X(G)ADeX^»-<«G 

We  can  now  prove  subrule  lb.  It  states  that  P (a  specifK  target  language  variable)  Is 
ncN-in  any  source  language  expression.  All  source  or  target  language  expressions  are  bulk  of 
nested  function  calls  and  quantification.  The  proof  is  then  one  of  induction  on  the  depth  of 
nesting.  1 he  base  step  is  established  by  using  subrule  2b  (P  ••  any  source  variable)  and 
subrule  II.  It  might  be  pointed  out  that  for  purposes  of  this  discussion  we  may  treat  source 
constants  as  if  they  were  source  variables.  1 hat  is,  we  may  use  subrule  2b  to  show  P ••  2, 
where  2 is  a source  language  constant.  The  induction  step  simply  requires  the  use  of  subrule 
13b  and  the  corresponding  rules  for  quantification. 

Fxactly  the  same  proof  holds  for  subrules  la  and  Ic,  since  they  also  involve  showing 
that  certain  target  language  variables  are  not-ln  source  language  expressions.  The  proof  of 
subrule  Id  Is  the  same  except  that  the  roles  of  target  and  source  language  have  been  reversed. 

We  will  now  return  to  substitution.  The  basic  axiom  we  will  assume  about  substitution 
is  similar  to  the  one  for  not-in.  It  may  be  characterized  by  stating  that  substitution  distributes 
over  the  arguments  of  source  language  functions,  and  is  expressed  formally  by: 


D 

D .. 

, , 

(f  Gl  . 

. . Cn) 

H - (f  Cl 

H .. 

. . Gn 

where  f is  any  source  language  function.  This  is  subrule  6. 

By  referring  to  the  definition  of  substitution  we  can  state  the  effects  of  applying  a 
substitution  to  an  identifier.  There  are  two  cases;  one  when  It  Is  the  Identifier  to  be 
substituted  for.  and  the  other  when  it  is  not. 


D 


D 

H - H 
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D » Dl  -»  Dl 


D 

H > Dl 


The  former  is  subrule  21.  and  the  latter  we  shall  call  axiom  C. 

We  will  also  need  to  give  axioms  for  the  interface  between  substitution  and 
quantification.  1 hcse  may  be  stated  in  words  as;  substituting  for  the  free  occurrences  of  an 
identifier  that  is  quantified  has  no  effect,  and  substitutions  may  be  done  Inside  quantification 
if  we  do  not  substitute  for  the  quantified  Identifier  nor  Introduce  a use  of  the  quantified 
identifier  that  gets  bound.  The  axioms,  along  with  their  names,  are 


19:  VX  (C) 


X 

H - VX  (C) 


19b;  3X  (G) 


X 

H - 3X  (G) 


18a:  X D A X -€  H -»  VX  (G) 


D D 

H • VX  (G  H ) 


I8a2:  X .<  D a X H ->  3X  <G) 


D D 

H • 3X  (G  H ) 


Subrule  18b  is  simply  a generalization  of  18a  for  use  In  cases  having  multiple  substitutions  or 
quantifications.  It  is  easily  derived  from  subrule  18a  by  induction  on  the  number  of 
quantifications  and  induction  on  the  number  of  substitutions. 

By  the  definition  of  substitution  we  are  not  allowed  to  Introduce  by  means  of 
substitution  a free  use  of  an  identifier  into  a place  where  It  becomes  bound.  The  following 
axioms  characterize  this  as  giving  an  undefined  result. 


Dl: 

D2: 


X D A X c H -»  VX  (G) 


X ^ D A X c H ->  3X  (G) 

W e will  now  prove  subrule  4: 


VX  (G 

3X  (G 


D 

undefined  ) 
D 

undefined  ) 


D G G 


D 

H 


G 


By  repeated  distribution  of  the  D-H  substitution  by  subrule  6 we  get  an  expression  equal  to  C 
with  the  D-H  substitution  that  has  that  substitution  done  on  each  atomic  Identifier,  except  in 
the  case  of  Identifiers  within  quantified  expressions,  the  substitution  Is  applied  to  the 
outermost  quantified  expression.  For  example,  if  C were  (FI  A (F2  B VX  ((FS  X C)) )).  the 
result  of  the  D-H  substitution  upon  C would  become; 
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II 

II 

(FI  A 

H (F2  B 

II  VX  ((F3  X O) 

Now  wc  repeatedly  distribute  D not-in  G as  far  as  possible  into  G by  subrule  13a.  Then  we 
a(iply  siibrulc  Mb  at  all  the  atomic  idr -tifiers  to  which  we  have  pushed  the  not-tn.  This 
allows  us  to  use  axiom  C and  drop  all  the  D-H  substitutions  on  the  atomic  identifiers.  But  we 
still  have  the  D-fl  substitutions  at  all  outermost  quantifications  Let  the  quantified  identifier 
under  discussion  be  denoted  X.  Now  two  cases  arise:  either  D « X.  or  D •«  X.  If  D = X,  apply 
siibriilc  19  or  19b  (depending  on  which  kind  of  quantification  it  is)  to  discard  the  substitution. 
If  I>  X and  X ->c  M,  we  apply  subrule  18a  or  I8a2.  If  D i'  X but  X -•*  H,  use  D1  or  D2  to 
move  the  substitution  inside  the  quantification.  We  may  also  apply  axioms  El  or  B2  to  move 
the  not-in  inside  the  quantification.  By  these  means  we  will  eventually  move  the  substitution 
and  the  not  in  down  in  all  cases  to  either  an  atomic  identifier  upon  which  axiom  C applies  or 
a quantification  upon  which  subrules  19  or  19b  apply.  In  all  cases  the  substitution  will  be 
discarded,  resulting  in  the  original  form  of  G being  left.  Hence  subrule  4 is  proved. 

We  will  now  prove  subrule  3a: 


D -c  C A D -c  HI  -+  D -•€  C 


III 

HI 


In  a manner  similar  to  the  proof  of  subrule  4 wc  will  repeatedly  distribute  the  Dl-HI 
substitution  by  subrulc  6,  and  the  D not-in  G by  subrule  13a.  When  the  substitution  and 
not-in  distributions  arc  forced  as  far  as  they  can  go,  the  object  on  which  the  substitution  is 
performed  is  one  of  four  cases.  If  the  object  is  an  atomic  identifier,  we  will  call  it  A.  If  it  is  a 
quantified  cxprc.ssion,  wc  will  call  the  quantified  identifier  X.  F.ither  1)  X ^ D1  and  X k D, 
2)  X 1)1  and  X ■-  D.  3)  X Dl,  or  4)  it  is  atomic  In  case  I)  we  may  apply  subrule  18a  or 
1832  or  axiom  IH  or  L)2  according  to  whether  X -'em  and  according  to  the  type  of 
quantification.  In  applying  any  of  the  four  rules  we  will  move  the  substitution  Inside  the 

quantification.  We  can  apply  Bl  or  B2  to  move  the  D not-in  to  the  inside  of  the 

quantification,  and  then  continue  to  distribute  the  substitution  and  not-in.  In  case  2)  wc  will 
apply  axiom  A I or  A2  to  establish  that  D is  not-in  the  quantified  object  with  the  substitution 
applied  inside  of  it.  But  that  is  equal  to  the  quantified  object  with  the  substitution  outside  by 
18a,  I8.i2,  ni,  or  1)2  In  case  3)  wc  have  established  that  d is  not-in  the  quantified  object 

without  the  substitution  applied,  but  that  is  equal  to  the  quantified  object  with  the 

substitution  applied  by  subrule  19  or  19b.  In  case  4)  we  have  reached  an  atom  that  is  not 
inside  1)  or  I) I quantification.  However  we  may  have  applied  axiom  Dl  or  D2  sometime  so 
that  the  substitution  of  DI-MI  may  have  become  Dl-undefined  in  any  case  the  result  of  the 
substitution  will  be  A or  Ml  or  undefined.  We  have  distributed  the  not-in  to  show  that 
I)  -c  A,  a ptcmi.se  of  the  theorem  is  D -c  HI,  and  D -«€  undefined  Therefore  D is  not-in  the 
result  of  the  substitution. 

In  all  of  the  cases  where  the  substitution  stops  propagating  we  have  shown  that  D is 
not-in  the  substituted  result.  1'herefore  repeated  application  of  subrule  I3b  and  axioms  At 
and  A 2 from  all  the  parts  of  the  substituted  G back  up  to  the  whole  will  prove  the  conclusion 
of  subrule  3a. 

Subrule  3b  is  simply  a generalization  of  3a  for  the  case  of  multiple  substitution.  It  is 
easily  proved  by  induction  on  the  number  of  substitutions. 
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The  proofs  of  subrules  5,  7,  8,  9,  12,  H,  and  20  should  follow  in  the  same  mold  as  those 
given  here  for  3 and  i (though  they  have  not  been  carried  out).  They  all  deal  with  properties 
of  one,  two,  or  three  sequential  substitutions,  except  the  usual  generalization  to  n substitutions 
in  7. 

We  now  return  to  quantification  for  a moment.  Subrule  16  gives  three  variations  on 
moving  irrelevant  items  outside  the  scope  of  quantification.  We  will  take  these  properties  as 
axiom.s  *1  he  corresponding  axioms  hold  for  existential  quantification,  but  we  have  not 
encountered  a need  for  them  as  subrules.  Subrule  17a  is  a simple  consequence  of  subrule  16b 
and  axiom  Al.  Again  we  have  a generalization  in  17b  for  the  case  of  multiple 
quantifications. 


